An Integrated System for Accessing the Digital Library

of the Parliament of Andalusia: Segmentation,

Annotation and Retrieval of Transcriptions and Videos

Luis M. de Campos, Juan M. Fern

andez-Luna, Juan F. Huete and Carlos J.

Mart

ın-Dancausa

Departamento de Ciencias de la Computaci

on e Inteligencia Artiﬁcial, E.T.S.I.

Inform

atica y de Telecomunicaci

on, Universidad de Granada, C.P. 18071, Granada, Spain

Abstract. In this paper, an integrated system for searching the ofﬁcial documents

published by the Parliament of Andalusia is presented. It uses the internal struc-

ture of these documents in order to offer not only complete documents but parts

of them given a query. Additionally, as the sessions of the Parliament are recorded

in video, jointly to the text, the system could return the associated pieces of video

to the retrieved elements. To be able to offer this service, several tools must be de-

veloped: PDF converters, video segmentation and annotation tools and a search

engine, all of them with their corresponding graphic interfaces for interacting

with the user. This paper describes the elements which comprises it.

1 Introduction

The Parliament of Andalusia was established in 1982. From that moment, this insti-

tution generates a group of electronic documents in PDF format called session diaries

and the ofﬁcial gazettes, published in the www.parlamentodeandalucia.es site. More-

over, the sessions are recorded in video, so additionally to the transcriptions, the digital

library of the Parliament is complemented with the videos.

In the session diaries, and therefore, in the videos, we can ﬁnd all the participa-

tions of the members of parliament, and also all the agreements achieved in the plenary

sessions of the Permanent and Commission Delegation passing laws or celebrating in-

formative sessions with members of the regional Government.

If we take into account that each session celebrated in the parliament presents a very

well deﬁned structure, as well as the fact that each document contains an exact replica

of its corresponding session, the content of each PDF is organized according to a strict

and rich structure that may be useful in terms of retrieval.

In the ﬁeld of Information Retrieval (IR) [1], when the retrieval mechanism is able

to use the structured information contained in the documents, we are dealing with so-

called structured IR [2]. Then, the internal organization of the documents is used to give

back the user, instead of a whole relevant document, only those parts of them which are

relevant. This means an important saving of user time.

Thanks to the internal organization of the text from session diaries and the ofﬁcial

gazettes, the legislative collection of the Parliament of Andalusia could be studied from

M. de Campos L., M. Fernández-Luna J., F. Huete J. and J. Martíın-Dancausa C. (2008).

An Integrated System for Accessing the Digital Library of the Parliament of Andalusia: Segmentation, Annotation and Retrieval of Transcriptions and

Videos.

In Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems, pages 38-47

 SciTePress

a structured IR perspective. But also the videos, or the pieces of them associated to

the document units could also be delivered to the user, so she/he could read the text or

watch the video. This would be an added value for a search engine accessing this digital

library. In this paper, we brieﬂy present the model in which this search engine is based

on, as well as the user interfaces of the search application.

But the infrastructure needed to reach this objective is quite complex, as text and

video must be processed properly. First of all, the collection of PDF documents must

be converted into XML, so the internal structure could be used by a search engine. A

second stage is the processing of the videos, because a link must be established between

a document, and its parts, and the video.of the same session. Then there is a task of

synchronizing the text contained in the session diary and the corresponding video. The

video is partitioned in segments of similar content, detecting the boundaries. When there

is a change of camera, a new segment is created. Finally, these segments are associated

with the textual transcription of the speeches. For these purposes, a segmetation and

annotation tools have been developed.

Most of the existing segmentation algorithms found in the speciﬁc literature are

designed for general videos [8]. This means that they are complex algorithms prepared

to detect the boundaries of the segment in all conditions. But in our context, a simple

algorithm based on histogram comparison could work very well, as the case is. In this

paper we brieﬂy outline the algorithm itself, how it has been improved, as well as the

main features of the segmentation and annotation tools.

Therefore, in this paper we describe an integrated system for searching the docu-

ments composing the digital library of the Parliament of Andalusia, composed of the

PDF to XML converter, the video segmentation and annotation tools, and the search

engine, as well as the way in which the user interacts with them. With this aim in mind,

this paper is articulated as follows: the next section will introduce the general archi-

tecture of the integrated system. Section 3 explains the converter of PDF documents to

XML. The segmentation algorithm as well as the annotation tool are described in Sec-

tion 4. The search engine, and the model in which is based on are discussed in Section

5. Next, this paper outlines the search interface (Section 6), and the ends with some

conclusions and future research lines (Section 7).

2 General Overview of the System Architecture

In order to offer a general overview of the elements that comprise the system, its general

architecture is presented. Figure 1 shows a graphical representation of the system.

The Parliament publishes the ofﬁcial documents in PDF format. This is not the

most appropriate format for structured retrieval. Then the ﬁrst step is to transform each

PDF ﬁle in a XML ﬁle, where the internal organization of these types of documents is

captured and represented by means of XML tags, so all the content of the documents

is structured. This format will allow the search engine to access the most appropriate

units of textual information given a query.

With respect to the videos of the sessions, as the main objective is to give the user

the possibility of accessing not only the most appropriate unit of text but also the piece

of video associated to that text, all the units of the XML documents must be synchro-

nized with their corresponding portions of videos. To achieve this, a previous step is

the division of the videos in segments. In the case of this regional chamber, as there are

only four cameras recording the sessions, the realization of the recordings is really sim-

ple, so the segments will coincide with the changes of cameras. A segmentation module

will be in charge of this task, giving as a result the segments and their keyframes (the

automatically obtained segmentation could be edited manually). The next step is the

synchronization of text and video. By means of an annotation tool, an expert user will

proceed to visually associate each segment with the corresponding XML tag containing

the transcription of the audio of the video segment. The output of this process is the

XML of a session with time tags, indicating where the beginning and the end of the

corresponding speech is located in the video.

In order to assure a fast downloading time and a direct access to the segements

of the videos, we use the Flash format for them. Therefore, we have to apply another

converter to transform the videos from their original AVI format into Flash format. This

converter also adds time tags in the videos.

The search engine, which lies on Garnata [4], an Information Retrieval System for

structured documents based on Bayesian Networks and Inﬂuence Diagrams [7], is in

charge of retriving the relevant parts of the documents given a query. This piece of

software has got a Web interface to interact with the user

. She/he formulates a query

using a form, where the needed information could be described. The search engine

takes this query and compute the relevance of all the elements contained in the XML

collection. Finally, Garnata shows all the structural units in the Web page, sorted by

decreasing value of relevance degrees. For each result, the text of the retrieved element,

a link to the PDF document, a link to the XML ﬁle and if it has an associated video, a

link to reproduce the corresponding portion of video are shown.

3 The PDF to XML Converter

In order to be able to perform structured retrieval, we have to convert the digital library

of the Parliament in PDF format into XML. With this conversion, we are transforming

the text contained in the PDF ﬁles, extracting the contents and placing them in the

corresponding parts of the well deﬁned structure of these ofﬁcial documents.

The conversion process has the following steps: Firstly, we transform the PDF doc-

uments into text ﬁles using an external tool called pdftotext. Afterthat the converter,

developed in Java, takes the text of these ﬁles and uses a lexical analyzer and a syn-

tax analyzer to process the text, extracting the tokens and generating the grammar to

detect these tokens. Finally, to create the XML ﬁles, we use the DOM API, based on

the building of a tree in memory whose nodes are XML tags. Therefore when a token

is detected a new node or a group of nodes are created in the DOM tree, generating

all the hierarchical structure of the XML format. Finally, the tree is converted into the

corresponding XML ﬁle and it is validated.

http://irutai.ugr.es/WebParlamento/index.php

elements. If we denote H the vector, and H(i) the number of occurrences of the grey

level i in the image, then, in order to determine if there has been a change between

two shots S1 and S2, with histograms H1 and H2, respectively, we could compute the

difference of both vectors: (H1 −H2)[i] =| H1[i] − H2[i] |. Computing the difference

for each grey level, and summing up all of them, we have a scalar value of the difference

between both. If this value is greater than a certain threshold, then there is a change in

the shots.

This is a really simple algorithm as a set of shots are considered included in the

same segment if the difference between their histograms is low. But experimenting

with the videos of parliamentary sessions, we realised that sometimes differences of

histograms between shots are high even when there are no camera changes. Therefore,

mistakes might be made. A solution is the application of a convolution ﬁlter, which

makes that each element of the vector is the sum of those closest elements: H1[i] =

0.1 ∗ H1[i − 2] + 0.2 ∗ H1[i − 1] + 0.4 ∗ H1[i] + 0.2 ∗ H1[i + 1] + 0.1 ∗ H1[i + 2]

. An important decision that will clearly have a great inﬂuence in the performance of

the segmentation is the selection of the threshold value. It will depend on the resolution

of the video, as in shots with a higher resolution the difference of their histograms will

be proportionally larger. Moreover, images with a higher number of colours will also

present a higher difference among shots, so we will have to consider the number of

tones contained in the images. Therefore, the threshold used in our algorithm is deﬁned

as: T = (W idth ∗ Height ∗ No. of colours)/K, where K is a parameter decisive to get

an optimal segmentation.

For the type of considered videos, the value of the K parameter has been obtained

empirically studying several of them. The process which has led us to get it has been

the following: ﬁrst of all, we have obtained the difference for each pair of contiguous

shots in each video; secondly, we have localized manually the cuts (changes of scene)

that the segmentation algorithm should detect; and ﬁnally, once we have studied the

values of differences in the shots in which there is a cut, we have selected a value for K

such as the threshold value is sufﬁciently low to detect all the real cuts, and sufﬁciently

high to not detect cuts which does not exists. With a threshold of 16, 000 all the cuts are

detected, and nothing except real cuts will be detected in the videos of the Parliament.

This basic segmentation algorithm works properly, but not efﬁciently. As the videos

from the sessions of the Parliament of Andalusia are very long (about 5 hours), it is

required to improve the segmentation speed, but without worsening the effectiveness.

The ﬁrst attempt is to reduce the number of shots to be considered. Instead of analysing

each pair of them, we will discard s shots between each studied pair. The next step will

be to reﬁne the segmentation to locate exactly where the cut is produced. This process is

much faster than comparing each single pair of shots and also offers the optimal result.

A second optimization is related to the size of the image. If the difference of his-

tograms with the full image is enough to differentiate shots, we could suppose that the

histogram of only a portion of the image could offer enough information to perform this

action. Then the reduction would improve the efﬁciency of the process, as the number of

computations is lower. In the case of the videos of the Parliament of Andalusia, where

the location of the cameras is known, we could know the part of the image that will

suffer less interferences of movements. Then, if we divide the image in four quadrants,

the most appropriate section will be the lower left quadrant, using the pixels of this area

to compute the histogram.

Once the automatic segmentation of a video has ﬁnished, the software offers the

possibility of editing the segmentation manually. The output of this process is a set of

segments, represented by a keyframe. The user may need to adjust the segmentation to

prepare the posterior process of annotation, in order to be more accurate. Therefore, the

user is allowed to edit the segments, combining them if they are contiguous, or dividing

segments in two. In the application, all the segments found by the algorithm are shown

in a window (Figure 2). More speciﬁcally, the keyframe of each segment. If we click

in one of them, then all the shots contained in it are shown in a separate window. By

means of submenus activated by the left button of the mouse, the user could edit the

segments. There is also implemented a viewer that allows to play any segment.

When the posterior manual edition is over, the user is ready to carry out the anno-

tation stage. The input of this process will be the sets of segments found in the video

corresponding to a parliamentary session and the transcription of the speeches given in

the chamber for that video. This transcription is represented by means of an XML doc-

ument, which contains the structure of the session, as well as the text itself. The output

will be the XML document containing the transcription synchronized with the video by

means of time tags in the elements of the document. The segment of the video related

to a speciﬁc text could be easily accessed. The annotation tool will consist of the man-

ual association of segments with the corresponding elements in the XML document, so

each tag will have a link to its corresponding part of the video.

Fig. 2. User interface for the segmentation tool.

Fig. 3. User interface for the annotation tool.

Figure 3 shows the user interface of the annotation tool. It is composed of four win-

dows. The left window shows the tree representation of an XML document containing

a session diary. If a leaf node is clicked, then the text contained in it is shown in the

central uppest window. The window below contains the segments found in the ﬁrst part

of the process. Finally, a player is included in the right part of the interface, in order

to help the annotation. The annotation process is as follows: the user selects a segment

in the video, then ﬁnd the node in the XML document containing the transcription of

the audio of that segment, and by means of a drag and drop action, associate the former

with the latter. These steps are repeated until all the segments have been assigned a node

of the document. Actually, with the association of a segment to an XML element of the

document, we introduce a pair of attributes to the corresponding tags, containing the

beginning and ending times of the segment. This information will be enough to access

the portion of the video in retrieval time. Once all the segments have been assigned leaf

nodes of the XML tree, and therefore, all the affected tags have been complemented

with temporal attributes linking the text with the video, it is necessary to propagate the

times to upper nodes until reaching the root node.

5 The Search Engine: Garnata

The search engine to retrieve the relevant material for the user is Garnata [4], an In-

formation Retrieval System, specially designed to work with structured documents

in XML. This system is based on the Context-based Inﬂuence Diagram model (CID

model) [3], which is supported by Inﬂuence Diagrams [5, 7]. These are probabilistic

graphical models specially designed for decision problems.

An Inﬂuence Diagram (ID) provides a simple notation for creating decision mod-

els by clarifying the qualitative issues of the factors which need to be considered and

how they are related, i.e. an intuitive representation of the model. It has also associated

an underlying quantitative representation in order to measure the strength of the rela-

tionships. More formally, an inﬂuence diagram is an acyclic directed graph containing

three types of nodes (decision, chance and utility) and two types of arcs (inﬂuence and

informative arcs). The goal of inﬂuence diagram modeling is to choose the alternative

decision that will lead to the highest expected gain (utility), i.e. the optimal policy. In

order to compute the solution, for each sequence of decisions, the utilities of its un-

certain consequences are weighted with the probabilities that these consequences will

occur.

With respect to the CID model, starting from a document collection containing a

set of documents, D, and the set of terms, T , used to index these documents, then we

assume that each document is organized hierarchically, representing structural asso-

ciations of its elements, which will be called structural units. Each structural unit is

composed of other smaller structural units, except some ‘terminal’ or ‘minimal’ units

which are indivisible, they do not contain any other unit, but they are composed of

terms. Conversely, each structural unit, except the one corresponding to the complete

document, is included in only one structural unit.

The chance nodes of the ID are the terms, T

, and the structural units, U

. They have

associated a binary random variable, whose values could be term/unit is not relevant or

is relevant, respectively.

Regarding the arcs, there is an arc from a given node (either term or structural unit)

to the particular structural unit node it belongs to, expressing the fact that the relevance

of a given structural unit to the user will depend on the relevance values of the different

elements (units or terms) that comprise it.

Decision nodes, R

, model the decision variables. There will be one node for each

structural unit. It represents the decision variable related to whether or not to return

the corresponding structural unit to the user, taking the values ‘retrieve the unit’ or ‘do

not retrieve the unit’. Finally, utility nodes, V

. We shall also consider one utility node

for each structural unit, and it will measure the value of utility of the corresponding

decision.

In addition to the arcs between chance nodes, we shall consider two different set of

arcs. In order to represent that the utility function of a decision node obviously depends

on the decision made and the relevance value of the structural unit considered, we use

arcs from each chance node U

and decision node R

to the utility node V

. Another

important set of arcs are those going from the unit where U

is contained to V

, which

represent that the utility of the decision about retrieving the unit U

also depends on the

relevance of the unit which contains it. Finally, for each node V

, the associated utility

functions must be deﬁned. In Figure 4, an example of the topology of the CID model is

shown.

T1T1T1 T2

T3 T4 T5

T6 T7 T8 T9 T10 T11

R1 U1

U2 R2 R3

U3 U4 R4

V2 V3

R6U6

Fig. 4. An example of the CID model.

To solve an inﬂuence diagram, the expected utility of each possible decision has

to be computed, thus making decisions which maximize the expected utility. In our

case, the situation of interest corresponds with the information provided by the user

when he/she formulates a query, Q, so we wish to compute the expected utility of each

decision given the query. In the context of a typical decision making problem, once the

expected utilities are computed, the decision with greatest utility is chosen: this would

mean to retrieve the structural unit U

if the expected utility of retrieving is greater than

the expected utility of not retrieving, and not to retrieve otherwise.

6 The Search Interface: Interacting with the User

The user interface of the search engine is based on a web page (http://irutai.ugr.es/-

WebParlamento), where a user, who wants to get some information from the legislative

collection of the Parliament of Andalusia, is able to express her/his information needs

by means of a form (see Figure 5). The search parameters are the number of legislature,

the kind of document (session diaries or ofﬁcial gazettes), publishing dates, range of

documents, and ﬁnally, the query text. There is also the possibility of indicating how

the results are arranged: a) Only one result for document: The system will show only

one result per document. This single document part should correspond to the best entry

point for starting to read the relevant text in the document. b) All the results grouped

by document: The search engine will return, for each document, all its relevant units

sorted by their relevance degree. c) All the results: all the relevants units, without any

association, presented to the user in decreasing order of their relevance degree.

Fig. 5. User interface for searching.

Fig. 6. Results of a query.

Once, the search engine has computed the relevance degree of the structural units of

the collection, the results are presented in a second web page in groups of ten. For each

result, it is provided a brief portion of the text of the structural unit, a link to the corre-

sponding PDF document that contains this unit, a link to the XML document displayed

in HTML format. Moreover, if a unit has an associated video, then there will be a link

to this video, so the user will be able to watch the portion of the video corresponding

to this structural unit. In Figure 6, we show an example of this presentation of results

when it has been selected ’All the results grouped by document’ option.

7 Conclusions and Further Research

This paper has presented an integrated software to access, from a structured retrieval

perspective, the documents and videos generated by the Parliament of Andalusia, com-

posed of all the tools needed to process these type of media: PDF to XML converter,

video segmentation and annotation and search engine, as well as the graphical inter-

faces to interact with the user. We think the system has yielded good results until now

but it is still in an experimental stage.

With respect to future works, we are planning the substitution of the video segmen-

tation and annotation stages, by an automatic synchronization of audio and text. We are

also working on the improvement of the retrieval capacity of the CID model.

Acknowledgements

Work jointly supported by the Spanish Ministerio de Educaci

on y Ciencia (TIN2005-

02516), Consejer

ıa de Innovaci

on, Ciencia y Empresa de la Junta de Andaluc

ıa (TIC-

276), and Spanish research programme Consolider Ingenio 2010: MIPRCV (CSD2007-

00018).

References

1. R. Baeza-Yates, B. Ribeiro-Neto. Modern information Retrieval, Addison-Wesley. 1999.

2. Y. Chiaramella. Information retrieval and structured documents, Lectures on IR, Springer,

286-309. 2001.

3. L. M. de Campos, J. M. Fern

andez-Luna, J. F. Huete. Using context information in structured

document retrieval: An approach using Inﬂuence Diagrams. IP&M. 40(5), 829 – 847, 2004.

4. L. de Campos, J. M. Fern

andez-Luna, J. Huete, and A. Romero. Garnata: An information

retrieval system for structured documents based on probabilistic graphical models. In Pro-

ceedings of the IMPU’06 conference, 1024–1031, 2006.

5. F. V. Jensen. Bayesian Networks and Decision Graphs. Springer-Verlag, 2001.

6. I. Koprinska, S. Carrato. Temporal video segmentation: A survey. Signal Processing: Image

Communication, 16(5), 477–500, 2001.

7. R. D. Shachter. Probabilistic inference and inﬂuence diagrams. Oper. Res., 36(4), 589–604,

1988.

8. F. Camastra, A. Vinciarelli. Video Segmentation and Keyframe Extraction. In Advanced In-

formation and Knowledge Processing. Springer, 413–430, 2007.

9. Y. Lu, W. Gao, F. Wu. Automatic video segmentation using a novel background model. In

Proc. of ISCAS. 2002.

10. A. Jain, S. Chaudhuri, A Fast Method for Textual Annotation of Compressed Video. In Proc.

of ICVGIP. 2002.

11. A. Hanjalic, R. Lagendijk, J. Biemond. Automated Segmentation of Movies into Logical

Story Units. Information Systems, 31(7), 638 – 658, 2006.