5.5 Synthesis
Regarding the results when varying the main
parameters and regarding the different quantizations
modelling different retrieval scenarios, we can
identify how to adapt the search engine to each case.
When searching only XML elements highly
exhaustive and fully specific (i.e. scenario: thorough,
strict) the adapted configuration of the search engine
is with weak score propagation (
α
= 0.1) and
without query presence factor (
ϕ
= 1). When
searching all the relevant XML elements (i.e.
scenario: thorough, generalized) the adapted
configuration of the search engine is with
intermediate score propagation (
α
= 0.6) and with
weak query presence factor (
ϕ
= 50).
Additional experiments were performed on CAS
topics defined in INEX 2005 according to VVCAS
evaluation. Regarding evaluation, the VVCAS task
criteria are similar to the CO.Thorough task. These
experiments confirm the search engine behaviour
according to the studied parameters. Regarding
quantizations same configurations that for
CO.Thorough task lead to the best results.
6 CONCLUSIONS
In this paper, we proposed a search engine for XML
retrieval. This engine is based on the addition of
contributions brought by each component of the
user’s information need. The search engine is largely
configurable intended to be adapted to different
contexts such as different retrieval scenarios.
Through experiments using the INEX framework we
have evaluated the influence of different parameters
on the effectiveness of the search engine. The
experiments confirm that the engine presented can
be adapted to better respond to different retrieval
scenarios and how to adapt it to a given scenario.
However, experiments have been analysed
globally using average precision. Like other
participants to INEX, our search engine has variable
effectiveness at query level. Additional studies have
to be carried out at the query level to define criteria
that permit to adapt the search engine per topic.
Some works (Sigurbjörnsson et al., 2005)(Pehcevski
et al., 2005) tried to define categories among the
INEX topics. It would be interesting to study our
search engine according to these categories. Fusion
of different results obtained with different
configurations of our search engine is also a
considered future work.
REFERENCES
Amer-Yahia S., Lakshmanan L., Pandit S., 2004.
FleXPath: Flexible Structure and Full-Text Querying
for XML, ACM SIGMOD, Paris, pp. 83-94.
Augé, J., Englmeier, K., Hubert, G., Mothe, J., 2003.
Catégorisation automatique de textes basée sur des
hiérarchies de concepts, BDA’03, 19
ièmes
Journées de
Bases de Données Avancées, Lyon, pp. 69-87.
Bray T., Paoli J., Sperberg-McQueen C. M., Maler E.,
Yergeau Y., 2004. Extensible, Markup Language
(XML) 1.0. (Third Edition), W3C Recommendation.
Carmel D., Maarek Y. S., Mandelbrod M., Mass Y., Soffer
A., 2003. Searching XML documents via XML
fragments, 26
th
international conference SIGIR,
Toronto, pp. 151-158.
Clark J., DeRose S., 1999. XML Path Language (XPath),
W3C Recommendation.
Crouch C. J., Apte S., Bapat H., 2003. An Approach to
Structured Retrieval Based on the Extended Vector
Model, 2
nd
INEX Workshop, Dagstuhl, pp. 89-93.
Fuhr N., Großjohann K., 2004. XIRQL: An XML query
language based on information retrieval concepts,
ACM TOIS, vol. 22, Issue 2, pp. 313-356.
Fuhr N., Maalik S., Lalmas M., 2003. Overview of the
INitiative for the Evaluation of XML Retrieval
(INEX) 2003, 2
nd
INEX Workshop, Dagstuhl, pp. 1-7.
Geva S., 2005. GPX – Gardens Point XML Information
Retrieval at INEX 2004, LNCS 3493, INEX’04, 3
rd
International Workshop, Dagstuhl, p. 211-223.
Hubert G., 2005. A voting method for XML retrieval,
LNCS 3493, INEX’04, 3
rd
International Workshop,
Dagstuhl, p. 183-196.
Kazaï G., Lalmas M., 2005. INEX 2005 Evaluation
Metrics, Pre Proceedings of the 4
th
INEX Workshop,
pp. 401-406.
Liu S., Zou O., Chu W. W., 2004. Configurable indexing
and ranking for XML information. 27
th
International
Conference SIGIR, Sheffield, pp. 88-95.
Ogilvie P., Callan J., 2003. Using Language Models for
Flat Text Queries in XML Retrieval, 2
nd
INEX
Workshop, Dagstuhl, pp. 12-18.
Pehcevski J., Thom J. A., Tahaghoghi S. M. M., 2005.
Hybrid XML Retrieval Revisited, LNCS 3493,
INEX’04, 3
rd
International Workshop, Dagstuhl, pp.
153-167.
Piwowarski B., Vu H.-T., Gallinari P., 2003. Bayesian
Networks and INEX'03, 2
nd
INEX Workshop,
Dagstuhl, pp. 33-37.
Ponte J. M., Croft W. B., 1998. A Language Modeling
Approach to Information Retrieval, 21
st
International
Conference SIGIR, Melbourne, pp. 275-281.
Salton G., Wong A., Yang C. S., 1975. A vector space
model for automatic indexing, Communication of the
ACM, vol. 18, Issue 11, pp. 613-620.
Sigurbjörnsson B., Kamps J., de Rijke M., 2005. Mixture
Models, Overlap, and Structural Hints in XML
Element Retrieval, LNCS 3493, INEX’04, 3
rd
International Workshop, Dagstuhl, pp. 196-210.
TUNING SEARCH ENGINE TO FIT XML RETRIEVAL SCENARIO
233