indexing which is the best non-combined method on
average. MAP is improved of more than 8%. Results
are improved compared to each of the three
preceding combinations. Apart from a few cases, the
results are statistically significant.
6 CONCLUSIONS
In this study, we consider French mono-lingual
information retrieval and analyze the influence of
different indexing methods considering different
types of indexing methods and query size. We
analyzed three different indexing methods
respectively single words, truncated terms and
lemmatization. Our experiments are done on the
French collections of CLEF from 2000 to 2005 and
we have shown in an experimental way that the use
of the lemmas-based indexing was the most effective
unique method when one is interested in mean
average precision and high precision. The difference
is of more than 7% for MAP and about 6% for P5
(statistically significant). We have also shown that it
was relevant to combine the results obtained using
different query size (use of various sections of the
topics). Therefore, combining the various methods
of indexing does not make a significant
improvement in the results.
Future work will focus on the contextual
combination of the indexing methods and variation
in the sizes of the queries. Indeed, one can think that
a query for which the terms used have many
alternatives but come from various concepts should
be rather indexed by lemmas. On the other hand, a
query for which few documents are retrieved would
gain in being indexed by truncated terms in order to
expand it. Recent works in the literature have
studied how to predict query difficulty in terms of
recall and precision. Others were interested in
predicting the possibility of finding relevant
documents in the collection. Our future work will
consider these aspects: system fusion will be based
on some knowledge on the difficulty of the query
and on other elements of the context of the query in
order to decide which sections of the query to
consider (query size) and what indexes should be
used (either a single type or a combination).
ACKNOWLEDGEMENTS
This work has partially been supported by the
French ANR through the TCAN program (ARIEL
Project) and by the European Commission through
the project WS-Talk under the 6th FP (COOP-
006026). However views expressed herein are ours.
REFERENCES
Ahlgren P. and Kekäläinen J., 2006. Swedish full text
retrieval: Effectiveness of different combinations of
indexing strategies with query terms, Information
Retrieval journal, 9(6): 681-697.
Beitzel S.M. et al., 2004, Fusion of Effective Retrieval
Strategies in the Same Information Retrieval System.
JASIST, 55(10): 859-868.
Boughanem M., Dkaki T., Mothe J., Soulé-Dupuy C.,
1998, Mercure at trec7, NIST 500-242, 413-418.
Denjean P., 1989, Interrogation d’un systeme videotex :
l’indexation automatique des textes, PhD dissertation,
Université de Toulouse, France.
Fox E.A. and Shaw J.A., 1994, Combination of Multiple
Searches, TREC-), NIST 500-215, 243-252.
Hubert G. and Mothe J., 2007, Relevance feedback as an
indicator to select the best search engine, ICEIS 2007,
184-189.
Kompaoré N. D. and Mothe J., 2007, Probabilistic fusion
and categorization of queries based on linguistic
features, ACM PIKM, 63-68.
Lee J., 1997, Analysis of multiple evidence combination,
ACM SIGIR, 267-276.
Lu X. A. and Keefer R. B., 1994, Query
Expansion/Reduction and its Impact on Retrieval
Effectiveness, NIST 500-225, TREC-3, 231-240.
McCabe M. C., Chowdhury A., Grossman D.A., Frieder
O., 1999, A unified Environment for Fusion of
Information Retrieval, ACM CIKM, 330-334.
Metzler D., Strohman T., Zhou Y., Croft W. B., Indri at
TREC 2005: Terabyte Track.
Mothe J. and Tanguy L., 2005, Linguistic features to
predict query difficulty, SIGIR wkshop on Predicting
Query Difficulty - Methods and Applications.
Robertson S E, et al., 1995, Okapi at TREC-3, Overview
of the Third Text REtrieval Conference, 109-128.
Savoy J., 2003, Cross-language information retrieval:
experiments based on CLEF 2000 corpa, IPM, V. 39,
75-115.
Voorhees, E.M., 2007. Overview of TREC 2006, NIST,
MD 20899, 1-16.
ICEIS 2008 - International Conference on Enterprise Information Systems
154