A NOVEL QUERY EXPANSION TECHNIQUE BASED ON A MIXED GRAPH OF TERMS

Fabio Clarizia, Francesco Colace, Massimo De Santo, Luca Greco, Paolo Napoletano

2011

Abstract

It is well known that one way to improve the accuracy of a text retrieval system is to expand the original query with additional knowledge coded through topic-related terms. In the case of an interactive environment, the expansion, which is usually represented as a list of words, is extracted from documents whose relevance is known thanks to the feedback of the user. In this paper we argue that the accuracy of a text retrieval system can be improved if we employ a query expansion method based on a mixed Graph of Terms representation instead of a method based on a simple list of words. The graph, that is composed of a directed and an undirected subgraph, can be automatically extracted from a small set of only relevant documents (namely the user feedback) using a method for term extraction based on the probabilistic Topic Model. The evaluation of the proposed method has been carried out by performing a comparison with two less complex structures: one represented as a set of pairs of words and another that is a simple list of words.

References

  1. Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. ACM Press, New York.
  2. Bhogal, J., Macfarlane, A., and Smith, P. (2007). A review of ontology based query expansion. Information Processing & Management, 43(4):866 - 886.
  3. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(993-1022).
  4. Callan, J., Croft, W. B., and Harding, S. M. (1992). The inquery retrieval system. In In Proceedings of the Third International Conference on Database and Expert Systems Applications, pages 78-83. SpringerVerlag.
  5. Cao, G., Nie, J.-Y., Gao, J., and Robertson, S. (2008).
  6. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 7808, pages 243-250, New York, NY, USA. ACM.
  7. Carpineto, C., de Mori, R., Romano, G., and Bigi, B. (2001). An information-theoretic approach to automatic query expansion. ACM Trans. Inf. Syst., 19:1- 27.
  8. Carterette, B., Allan, J., and Sitaraman, R. (2008). Minimal test collections for retrieval evaluation. In 29th International ACM SIGIR Conference on Research and development in information retrieval.
  9. Christopher D. Manning, P. R. and Schtze, H. (2008). Introduction to Information Retrieval. Cambridge University.
  10. Clarizia, F., Greco, L., and Napoletano, P. (2011). An adaptive optimisation method for automatic lightweight ontology extractions. In Filipe, J. and Cordeiro, J., editors, Lecture Notes in Business Information Processing, page 357371. Springer-Verlag Berlin Heidelberg.
  11. Dumais, S., Joachims, T., Bharat, K., and Weigend, A. (2003). SIGIR 2003 workshop report: implicit measures of user interests and preferences. SIGIR Forum, 37(2):50-54.
  12. Efthimiadis, E. N. (1996). Query expansion. In Williams, M. E., editor, Annual Review of Information Systems and Technology, pages 121-187.
  13. Griffiths, T. L., Steyvers, M., and Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114(2):211-244.
  14. Jansen, B. J., Booth, D. L., and Spink, A. (2008). Determining the informational, navigational, and transactional intent of web queries. Information Processing & Management, 44(3):1251 - 1266.
  15. Jansen, B. J., Spink, A., and Saracevic, T. (2000). Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing & Management, 36(2):207-227.
  16. Ko, Y. and Seo, J. (2009). Text classification from unlabeled documents with bootstrapping and feature projection techniques. Inf. Process. Manage., 45:70-83.
  17. Lang, H., Metzler, D., Wang, B., and Li, J.-T. (2010). Improved latent concept expansion using hierarchical markov random fields. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM 7810, pages 249- 258, New York, NY, USA. ACM.
  18. Lee, C.-J., Lin, Y.-C., Chen, R.-C., and Cheng, P.-J. (2009). Selecting effective terms for query formulation. In Lee, G., Song, D., Lin, C.-Y., Aizawa, A., Kuriyama, K., Yoshioka, M., and Sakai, T., editors, Information Retrieval Technology, volume 5839 of Lecture Notes in Computer Science, pages 168-180. Springer Berlin / Heidelberg.
  19. Noam, S. and Naftali, T. (2001). The power of word clusters for text classification. In In 23rd European Colloquium on Information Retrieval Research.
  20. Okabe, M. and Yamada, S. (2007). Semisupervised query expansion with minimal feedback. IEEE Transactions on Knowledge and Data Engineering, 19:1585-1589.
  21. Piao, S., Rea, B., McNaught, J., and Ananiadou, S. (2010). Improving full text search with text mining tools. In Horacek, H., Mtais, E., Muoz, R., and Wolska, M., editors, Natural Language Processing and Information Systems, volume 5723 of Lecture Notes in Computer Science, pages 301-302. Springer Berlin / Heidelberg.
  22. Robertson, S. E. (1991). On term selection for query expansion. J. Doc., 46:359-364.
  23. Robertson, S. E. and Walker, S. (1997). On relevance weights with little relevance information. In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 7897, pages 16-24, New York, NY, USA. ACM.
  24. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Comput. Surv., 34:1-47.
Download


Paper Citation


in Harvard Style

Clarizia F., Colace F., De Santo M., Greco L. and Napoletano P. (2011). A NOVEL QUERY EXPANSION TECHNIQUE BASED ON A MIXED GRAPH OF TERMS . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 84-93. DOI: 10.5220/0003660500840093


in Bibtex Style

@conference{kdir11,
author={Fabio Clarizia and Francesco Colace and Massimo De Santo and Luca Greco and Paolo Napoletano},
title={A NOVEL QUERY EXPANSION TECHNIQUE BASED ON A MIXED GRAPH OF TERMS},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={84-93},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003660500840093},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - A NOVEL QUERY EXPANSION TECHNIQUE BASED ON A MIXED GRAPH OF TERMS
SN - 978-989-8425-79-9
AU - Clarizia F.
AU - Colace F.
AU - De Santo M.
AU - Greco L.
AU - Napoletano P.
PY - 2011
SP - 84
EP - 93
DO - 10.5220/0003660500840093