of the document. Actually, with the association of a segment to an XML element of the
document, we introduce a pair of attributes to the corresponding tags, containing the
beginning and ending times of the segment. This information will be enough to access
the portion of the video in retrieval time. Once all the segments have been assigned leaf
nodes of the XML tree, and therefore, all the affected tags have been complemented
with temporal attributes linking the text with the video, it is necessary to propagate the
times to upper nodes until reaching the root node.
5 The Search Engine: Garnata
The search engine to retrieve the relevant material for the user is Garnata [4], an In-
formation Retrieval System, specially designed to work with structured documents
in XML. This system is based on the Context-based Influence Diagram model (CID
model) [3], which is supported by Influence Diagrams [5, 7]. These are probabilistic
graphical models specially designed for decision problems.
An Influence Diagram (ID) provides a simple notation for creating decision mod-
els by clarifying the qualitative issues of the factors which need to be considered and
how they are related, i.e. an intuitive representation of the model. It has also associated
an underlying quantitative representation in order to measure the strength of the rela-
tionships. More formally, an influence diagram is an acyclic directed graph containing
three types of nodes (decision, chance and utility) and two types of arcs (influence and
informative arcs). The goal of influence diagram modeling is to choose the alternative
decision that will lead to the highest expected gain (utility), i.e. the optimal policy. In
order to compute the solution, for each sequence of decisions, the utilities of its un-
certain consequences are weighted with the probabilities that these consequences will
occur.
With respect to the CID model, starting from a document collection containing a
set of documents, D, and the set of terms, T , used to index these documents, then we
assume that each document is organized hierarchically, representing structural asso-
ciations of its elements, which will be called structural units. Each structural unit is
composed of other smaller structural units, except some ‘terminal’ or ‘minimal’ units
which are indivisible, they do not contain any other unit, but they are composed of
terms. Conversely, each structural unit, except the one corresponding to the complete
document, is included in only one structural unit.
The chance nodes of the ID are the terms, T
j
, and the structural units, U
i
. They have
associated a binary random variable, whose values could be term/unit is not relevant or
is relevant, respectively.
Regarding the arcs, there is an arc from a given node (either term or structural unit)
to the particular structural unit node it belongs to, expressing the fact that the relevance
of a given structural unit to the user will depend on the relevance values of the different
elements (units or terms) that comprise it.
Decision nodes, R
i
, model the decision variables. There will be one node for each
structural unit. It represents the decision variable related to whether or not to return
the corresponding structural unit to the user, taking the values ‘retrieve the unit’ or ‘do
not retrieve the unit’. Finally, utility nodes, V
i
. We shall also consider one utility node
44