Semantic Similarity between Queries in QA System using a Domain-specific Taxonomy

Hilda Kosorus, Andreas Bögl, Josef Küng

Abstract

Semantic similarity has been extensively studied in the past decades and has become a rapidly growing field of research. Sentence or short text similarity measures play an important role in text-based applications, such as text mining, information retrieval and question answering systems. In this paper we consider the problem of semantic similarity between queries in a question answering system with the purpose of query recommendation. Our approach is based on an existing domain-specific taxonomy. We define innovative three-layered semantic similarity measures between queries using existing similarity measures between ontology concepts combined with various set-based distance measures. We then analyse and evaluate our approach against human intuition using a data set of 90 questions. Further on, we argue that these measures are taxonomy-dependent and are influenced by various factors: taxonomy structure, keyword mappings, keyword weights, query-keyword mappings and the chosen concept similarity measure.

References

  1. Bernstein, A., Kaufmann, E., Bürki, C., and Klein, M. (2005). How similar is it? Towards personalized similarity measures in ontologies. In 7. Internationale Tagung Wirtschaftsinformatik, pages 1347-1366.
  2. Bin, S., Liying, F., Jianzhuo, Y., Pu, W., and Zhongcheng, Z. (2009). Ontology-based measure of semantic similarity between concepts. In World Congress on Software Engineering, volume 2, pages 109-112.
  3. Bouquet, P., Kuper, G., Scoz, M., and Zanobini, S. (2004). Asking and answering semantic queries. In Proceedings of Meaning Coordination and Negotiation Workshop (MCNW-04) in conjunction with International Semantic Web Conference.
  4. Burgess, C., Livesay, K., and Lund, K. (1998). Explorations in context space: Words, sentences, discourse. Discourse Processes, 25(2-3):211-257.
  5. Cordì, V., Lombardi, P., Martelli, M., and Mascardi, V. (2005). An ontology-based similarity between sets of concepts. In 6th Joint Workshop ”From Objects to Agents”: Simulation and Formal Analysis of Complex Systems, pages 16-21, Camerino,Italy.
  6. Dong, H., Hussain, F. H., and Chang, E. (2009). A hybrid concept similarity measure model for ontology environment. In Proceedings of the Confederated International Workshops and Posters on the Move to Meaningful Internet Systems, pages 848-857.
  7. Eiter, T. and Mannila, H. (1997). Distance measures for point sets and their computation. Journal Acta Informatica, 34:103-133.
  8. Haase, P., Siebes, R., and Harmelen, F. V. (2004). Peer selection in peer-to-peer networks with semantic topologies. In International Conference on Semantics of a Networked World: Semantics for Grid Databases.
  9. HaCohen-Kerner, Y., , Gross, Z., and Masa, A. (2005). Automatic extraction and learning of keyphrases from scientific articles. In Gelbukh, A., editor, Computational Linguistics and Intelligent Text Processing, volume 3406 of Lecture Notes in Computer Science, pages 657-669. Springer Berlin / Heidelberg.
  10. Hulth, A. (2003). Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing, EMNLP 7803, pages 216- 223, Stroudsburg, PA, USA. Association for Computational Linguistics.
  11. Jiang, J. and Conrath, W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference Research on Computational Linguistics, pages 19-33, Taiwan.
  12. Landauer, T. K., Foltz, P. W., and Laham, D. (1998a). Introduction to latent semantic analysis. Discourse Processes, 25(2-3):259-284.
  13. Landauer, T. K., Laham, D., and Foltz, P. (1998b). Learning human-like knowledge by singular value decomposition: A progress report. In Advances in Neural Information Processing Systems 10, pages 45-51. MIT Press.
  14. Leacock, C. and Chodorow, M. (1998). Combining Local Context and WordNet Similarity for Word Sense Identification, pages 305-332. In C. Fellbaum (Ed.), MIT Press.
  15. Lee, J. H., Kim, M. H., and Lee”, Y. J. (1993). Information retrieval based on conceptual distance in IS-A hierarchies. Journal of Documentation, 49(2):188-207.
  16. Lee, W. N., Shah, N., Sundlass, K., and Musen, M. (2008). Comparison of ontology-based semanticsimilarity measures. In AMIA Annual Symposium Proceedings, pages 384-388.
  17. Li, Y., Bandar, Z. A., and McLean, D. (2003). An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4).
  18. Li, Y., McLean, D., Bandar, Z. A., O'Shea, J. D., and Crockett, K. (2006). Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering, 18(8):1138- 1150.
  19. Lin, D. (1998). An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, pages 296-304.
  20. Marcel, P. and Negre, E. (2011). A survey of query recommendation techniques for data warehouse exploration. 7èmes Journées Francophones sur les Entrepoˆts de Données et l'Analyse en ligne (EDA), B-7.
  21. Oliva, J., Serrano, J. I., del Castillo, M. D., and Iglesias, A. (2011). Sysmss: A syntax-based measure for shorttext semantic similarity. Data and Knowledge Engineering, 70:390-405.
  22. O'Shea, J., Bandar, Z., Crockett, K., and McLean, D. (2010). Benchmarking short text semantic similarity. International Journal of Intelligent Information and Database Systems, 4(2):103 - 120.
  23. Rada, R., Mili, H., Bicknell, E., and Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1):17-30.
  24. Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of IJCAI-95, pages 448-453, Montreal, Canada.
  25. Resnik, P. (1999). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal or Artificial Intelligence Research, 11:95-130.
  26. Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. In Information Processing and Management, pages 513-523.
  27. Turney, P. D. (2000). Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303-336.
  28. Wang, G. H., Wang, Y. D., and Guo, M. Z. (2006). An ontology-based method for similarity calculation of concepts in the semantic web. In Proceedings of the 5th International Conference on Machine Learning and Cybernetics, pages 1538-1542, Dalian, China.
  29. Wu, Z. and Palmer, M. (1994). Verb semantics and lexical selection. In 32nd Annual Meeting of the Association for Computational Linguistics, pages 133-138.
Download


Paper Citation


in Harvard Style

Kosorus H., Bögl A. and Küng J. (2012). Semantic Similarity between Queries in QA System using a Domain-specific Taxonomy . In Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8565-10-5, pages 241-246. DOI: 10.5220/0003965902410246


in Bibtex Style

@conference{iceis12,
author={Hilda Kosorus and Andreas Bögl and Josef Küng},
title={Semantic Similarity between Queries in QA System using a Domain-specific Taxonomy},
booktitle={Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2012},
pages={241-246},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003965902410246},
isbn={978-989-8565-10-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Semantic Similarity between Queries in QA System using a Domain-specific Taxonomy
SN - 978-989-8565-10-5
AU - Kosorus H.
AU - Bögl A.
AU - Küng J.
PY - 2012
SP - 241
EP - 246
DO - 10.5220/0003965902410246