Evidential-Link-based Approach for Re-ranking XML Retrieval Results

M'hamed Mataoui, Mohamed Mezghiche, Faouzi Sebbak, Farid Benhammadi

2014

Abstract

In this paper, we propose a new evidential link-based approach for re-ranking XML retrieval results. The approach, based on Dempster-Shafer theory of evidence, combines, for each retrieved XML element, content relevance evidence, and computed link evidence (score and rank). The use of the Dempster–Shafer theory is motivated by the need to improve retrieval accuracy by incorporating the uncertain nature of both bodies of evidence (content and link relevance). The link score is computed according to a new link analysis algorithm based on weighted links, where relevance is propagated through the two types of links, i.e., hierarchical and navigational. The propagation, i.e. the amount of relevance score received by each retrieved XML element, depends on link weight which is defined according to two parameters: link type and link length. To evaluate our proposal we carried out a set of experiments based on INEX data collection.

References

  1. Wikipedia: The free encyclopedia. 2013. wikipedia.org/.
  2. Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1):107-117.
  3. Dempster, A. P. (1967). Upper and lower probabilities induced by a multivalued mapping. The annals of mathematical statistics, pages 325-339.
  4. Denoyer, L. and Gallinari, P. (2007). The wikipedia xml corpus. In Comparative Evaluation of XML Information Retrieval Systems, pages 12-19. Springer.
  5. Dopichaj, P., Skusa, A., and Heß, A. (2009). Stealing anchors to link the wiki. In Advances in Focused Retrieval, pages 343-353. Springer.
  6. Fachry, K. N., Kamps, J., Koolen, M., and Zhang, J. (2008). Using and detecting links in wikipedia. In Focused access to XML documents, pages 388-403. Springer.
  7. Farahat, A., LoFaro, T., Miller, J. C., Rae, G., and Ward, L. A. (2006). Authority rankings from hits, pagerank, and salsa: Existence, uniqueness, and effect of initialization. SIAM Journal on Scientific Computing, 27(4):1181-1201.
  8. Fuhr, N. and Großjohann, K. (2001). Xirql: A query language for information retrieval in xml documents. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 172-180. ACM.
  9. Geva, S., Kamps, J., Lethonen, M., Schenkel, R., Thom, J. A., and Trotman, A. (2010). Overview of the inex 2009 ad hoc track. In Focused retrieval and evaluation, pages 4-25. Springer.
  10. Geva, S., Trotman, A., and Tang, L.-X. (2009). Link discovery in the wikipedia. Pre-Proceedings of INEX 2009.
  11. Gövert, N. and Kazai, G. (2002). Overview of the initiative for the evaluation of xml retrieval (inex) 2002. In INEX Workshop, pages 1-17. Citeseer.
  12. Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J. (2003). Xrank: ranked keyword search over xml documents. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 16-27. ACM.
  13. Itakura, K. Y., Clarke, C. L., Geva, S., Trotman, A., and Huang, W. C. (2011). Topical and structural linkage in wikipedia. In Advances in Information Retrieval, pages 460-465. Springer.
  14. Jenkinson, D., Leung, K.-C., and Trotman, A. (2009). Wikisearching and wikilinking. In Advances in Focused Retrieval, pages 374-388. Springer.
  15. Kamps, J. and Koolen, M. (2008). The importance of link evidence in wikipedia. In Advances in Information Retrieval, pages 270-282. Springer.
  16. Kimelfeld, B., Kovacs, E., Sagiv, Y., and Yahav, D. (2007). Using language models and the hits algorithm for xml retrieval. In Comparative Evaluation of XML Information Retrieval Systems, pages 253-260. Springer.
  17. Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5):604-632.
  18. Lalmas, M. and Ruthven, I. (1998). Representing and retrieving structured documents using the dempstershafer theory of evidence: Modelling and evaluation. Journal of Documentation, 54(5):529-565.
  19. Lempel, R. and Moran, S. (2001). Salsa: the stochastic approach for link-structure analysis. ACM Transactions on Information Systems (TOIS), 19(2):131-160.
  20. Mataoui, M., Mezghiche, M., and Boughanem, M. (2010). Exploiting link evidence to improve xml information retrieval. In Proceeding de la Confrence Internationale sur l'Extraction et la Gestion des Connaissances Maghreb (EGC-M), pages 23-33. ESI.
  21. Pehcevski, J., Vercoustre, A.-M., and Thom, J. A. (2008). Exploiting locality of wikipedia links in entity ranking. In Advances in Information Retrieval, pages 258- 269. Springer.
  22. Schocken, S. and Hummel, R. A. (1993). On the use of the dempster shafer model in information indexing and retrieval applications. International Journal of ManMachine Studies, 39(5):843-879.
  23. Shafer, G. (1976). A Mathematical Theory of Evidence, volume 1. Princeton university press Princeton.
  24. Verbyst, D. and Mulhem, P. (2009). Using collectionlinks and documents as context for inex 2008. In Advances in focused retrieval, pages 87-96. Springer.
  25. Zhang, J. and Kamps, J. (2008). Link detection in xml documents: What about repeated links. In SIGIR 2008 Workshop on Focused Retrieval, pages 59-66.
Download


Paper Citation


in Harvard Style

Mataoui M., Mezghiche M., Sebbak F. and Benhammadi F. (2014). Evidential-Link-based Approach for Re-ranking XML Retrieval Results . In Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-035-2, pages 64-71. DOI: 10.5220/0005003900640071


in Bibtex Style

@conference{data14,
author={M'hamed Mataoui and Mohamed Mezghiche and Faouzi Sebbak and Farid Benhammadi},
title={Evidential-Link-based Approach for Re-ranking XML Retrieval Results},
booktitle={Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2014},
pages={64-71},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005003900640071},
isbn={978-989-758-035-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Evidential-Link-based Approach for Re-ranking XML Retrieval Results
SN - 978-989-758-035-2
AU - Mataoui M.
AU - Mezghiche M.
AU - Sebbak F.
AU - Benhammadi F.
PY - 2014
SP - 64
EP - 71
DO - 10.5220/0005003900640071