A Novel Approach to Query Expansion based on Semantic Similarity Measures

Flora Amato, Aniello De Santo, Francesco Gargiulo, Vincenzo Moscato, Fabio Persia, Antonio Picariello, Giancarlo Sperlì

2015

Abstract

In this paper, we present a framework supporting information retrieval over corpora of documents using an automatic semantic query expansion approach. The main idea is to expand the set of words used as query terms exploiting the notion of semantic similarity between the concepts related to the search terms. We leverage existing lexical resources and similarity metrics computed among terms to generate - by a proper mapping into a vectorial space - an index for the fast retrieval of a set of terms “semantically correlated” to a given query term. The vector of expanded terms is then exploited in the query stage to retrieve documents that are significantly related to specific combinations of the query terms. Preliminary experimental results concerning efficiency and effectiveness of the proposed approach are reported and discussed.

References

  1. Albanese, M., Capasso, P., Picariello, A., and Rinaldi, A. M. (2005). Information retrieval from the web: an interactive paradigm. Advances in Multimedia Information Systems, pages 17-32.
  2. Amato, F., De Santo, A., Gargiulo, F., Moscato, V., Persia, F., Picariello, A., and Poccia, S. (2015a). Semindex: an index for supporting semantic retrieval of documents. In Proceedings of the IEEE DESWeb ICDE 2015.
  3. Amato, F., De Santo, A., Moscato, V., Picariello, A., Serpico, D., and Sperli, G. (2015b). A lexicongrammar based methodology for ontology population in e-health applications. In The 9-th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS-2015). Blumenau, Brazil.
  4. Amato, F., Mazzeo, A., Moscato, V., and Picariello, A. (2009). A system for semantic retrieval and longterm preservation of multimedia documents in the egovernment domain. International Journal of Web and Grid Services, 5(4):323-338.
  5. Amato, F., Mazzeo, A., Moscato, V., and Picariello, A. (2014). Exploiting cloud technologies and context information for recommending touristic paths. In Intelligent Distributed Computing VII, pages 281-287. Springer.
  6. Amato, F., Mazzeo, A., Penta, A., and Picariello, A. (2008). Knowledge representation and management for egovernment documents. IFIP International Federation for Information Processing, 280:31-40.
  7. Bouchoucha, A., Liu, X., and Nie, J.-Y. (2014). Integrating multiple resources for diversified query expansion. In Advances in Information Retrieval, pages 437-442. Springer.
  8. Buey, M. G., Garrido, Í . L., and Ilarri, S. (2014). An approach for automatic query expansion based on nlp and semantics. In Database and Expert Systems Applications, pages 349-356. Springer.
  9. Carpineto, C. and Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR), 44(1):1.
  10. Castells, P., Fernandez, M., and Vallet, D. (2007). An adaptation of the vector-space model for ontology-based information retrieval. Knowledge and Data Engineering, IEEE Transactions on, 19(2):261-272.
  11. Colace, F., De Santo, M., Greco, L., and Napoletano, P. (2015). Weighted word pairs for query expansion. Information Processing & Management, 51(1):179- 193.
  12. Dalton, J., Dietz, L., and Allan, J. (2014). Entity query feature expansion using knowledge base links. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 365-374. ACM.
  13. Ermakova, L., Mothe, J., and Ovchinnikova, I. (2014). Query expansion in information retrieval: What can we learn from a deep analysis of queries? In International Conference on Computational LinguisticsDialogue 2014, volume 20, pages pp-162.
  14. Faloutsos, C. and Lin, K.-I. (1995). FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets, volume 24. ACM.
  15. Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., and Motta, E. (2011). Semantically enhanced information retrieval: an ontology-based approach. Web Semantics: Science, Services and Agents on the World Wide Web, 9(4):434-452.
  16. Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T. (1987). The vocabulary problem in humansystem communication. Communications of the ACM, 30(11):964-971.
  17. Huang, J. X., Miao, J., and He, B. (2013). High performance query expansion using adaptive co-training. Information Processing & Management, 49(2):441- 453.
  18. Jain, V. and Singh, M. (2013). Ontology based information retrieval in semantic web: A survey. International Journal of Information Technology and Computer Science (IJITCS), 5(10):62.
  19. Maron, M. E. and Kuhns, J. L. (1960). On relevance, probabilistic indexing and information retrieval. Journal of the ACM (JACM), 7(3):216-244.
  20. Moscato, V., Picariello, A., and Rinaldi, A. M. (2010a). A combined relevance feedback approach for user recommendation in e-commerce applications. In Advances in Computer-Human Interactions, 2010. ACHI'10. Third International Conference on, pages 209-214. IEEE.
  21. Moscato, V., Picariello, A., and Rinaldi, A. M. (2010b). A recommendation strategy based on user behavior in digital ecosystems. In Proceedings of the International Conference on Management of Emergent Digital EcoSystems, pages 25-32. ACM.
  22. Pal, D., Mitra, M., and Datta, K. (2014). Improving query expansion using wordnet. Journal of the Association for Information Science and Technology, 65(12):2469-2478.
  23. Resnik, P. (1999). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language.
  24. Resnik, P. (2011). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. CoRR, abs/1105.5444.
  25. Rinaldi, A. M. (2008). A content-based approach for document representation and retrieval. In Proceedings of the eighth ACM symposium on Document engineering, pages 106-109. ACM.
  26. Rivas, A. R., Iglesias, E. L., and Borrajo, L. (2014). Study of query expansion techniques and their application in the biomedical information retrieval. The Scientific World Journal, 2014.
  27. Sagayam, R., Srinivasan, S., and Roshni, S. (2012). A survey of text mining: Retrieval, extraction and indexing techniques. International Journal Of Computational Engineering Research, 2(5).
  28. Vallet, D., Fernández, M., and Castells, P. (2005). An ontology-based information retrieval model. In The Semantic Web: Research and Applications, pages 455-470. Springer.
  29. Yang, K.-H., Lin, Y.-L., and Chuang, C.-T. (2014). Using google distance for query expansion in expert finding. In Digital Information Management (ICDIM), 2014 Ninth International Conference on, pages 104-109. IEEE.
Download


Paper Citation


in Harvard Style

Amato F., De Santo A., Gargiulo F., Moscato V., Persia F., Picariello A. and Sperlì G. (2015). A Novel Approach to Query Expansion based on Semantic Similarity Measures . In Proceedings of 4th International Conference on Data Management Technologies and Applications - Volume 1: KomIS, (DATA 2015) ISBN 978-989-758-103-8, pages 344-353. DOI: 10.5220/0005579703440353


in Bibtex Style

@conference{komis15,
author={Flora Amato and Aniello De Santo and Francesco Gargiulo and Vincenzo Moscato and Fabio Persia and Antonio Picariello and Giancarlo Sperlì},
title={A Novel Approach to Query Expansion based on Semantic Similarity Measures},
booktitle={Proceedings of 4th International Conference on Data Management Technologies and Applications - Volume 1: KomIS, (DATA 2015)},
year={2015},
pages={344-353},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005579703440353},
isbn={978-989-758-103-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of 4th International Conference on Data Management Technologies and Applications - Volume 1: KomIS, (DATA 2015)
TI - A Novel Approach to Query Expansion based on Semantic Similarity Measures
SN - 978-989-758-103-8
AU - Amato F.
AU - De Santo A.
AU - Gargiulo F.
AU - Moscato V.
AU - Persia F.
AU - Picariello A.
AU - Sperlì G.
PY - 2015
SP - 344
EP - 353
DO - 10.5220/0005579703440353