WHAT IS THE RELATIONSHIP ABOUT? - Extracting Information about Relationships from Wikipedia

Brigitte Mathiak, Victor Manuel Martínez Peña, Andias Wira-Alam

Abstract

What is the relationship between terms? Document analysis tells us that ”Crime” is close to ”Victim” and not so close to ”Banana”. While for common terms like Sun and Light the nature of the relationship is clear, the measure becomes more fuzzy when dealing with more uncommonly used terms and concepts and partial information. Semantic relatedness is typically calculated from an encyclopedia like Wikipedia, but Wikipedia contains a lot of information that is not common knowledge. So, when a computer calculates that Belarus and Ukraine are closely related, what does it mean to me as a human? In this paper, we take a look at perceived relationship and qualify it in a human-readable way. The result is a search engine, designed to take two terms and explain how they relate to each other. We evaluate this through a user study which gauges how useful this extra information is to humans when making a judgment about relationships.

References

  1. Budanitsky, A. (1999). Lexical semantic relatedness and its application in natural language processing. Technical report, Department of Computer Science, University of Toronto.
  2. Dillman, D. A. and Bowker, D. K. (2001). The Web Questionnaire Challenge to Survey Methodologists, pages 159-178. Pabst Science Publishers.
  3. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. (2002). Placing search in context: the concept revisited. ACM Trans. Inf. Syst., 20:116-131.
  4. Gabrilovich, E. and Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference on Artifical intelligence, IJCAI'07, pages 1606-1611, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  5. Islam, A. and Inkpen, D. (2008). Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discov. Data, 2:10:1-10:25.
  6. Leacock, C., Miller, G. A., and Chodorow, M. (1998). Using corpus statistics and wordnet relations for sense identification. Comput. Linguist., 24:147-165.
  7. Martin, R. C., Bolter, J. F., Todd, M. E., Gouvier, W. D., and Niccolls, R. (1993). Effects of sophistication and motivation on the detection of malingered memory performance using a computerized forced-choice task. Journal of Clinical and Experimental Neuropsychology, 15(6):867-880.
  8. Milne, D. and Witten, I. H. (2008). An effective, lowcost measure of semantic relatedness obtained from wikipedia links. Proceedings of the first AAAI Workshop on Wikipedia and Artificial Intelligence WIKIAI08, pages 25-30.
  9. Nakayama, K., Hara, T., and Nishio, S. (2008). Wikipedia link structure and text mining for semantic relation extraction. Workshop on Semantic Search (SemSearch 2008) at the 5th European Semantic Web Conference (ESWC 2008), pages 59-73.
  10. Navarro, G. (2001). A guided tour to approximate string matching. ACM Comput. Surv., 33:31-88.
  11. Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. Department of Computer Science, Rutgers University, 23515 BPO Way, Piscataway, NJ, 08855.
  12. Rodgers, J. L. and Nicewander, W. A. (1988). Thirteen ways to look at the correlation coefficient. The American Statistician, 42(1):59-66.
  13. Slimani, T., Ben Yaghlane, B., and Mellouli, K. (2006). A new similarity measure based on edge counting. World Academy of Science, Engineering and Technology, 23(8):34-38.
  14. Smyth, J. D., Dillman, D. A., Christian, L. M., and Stern, M. J. (Spring 2006). Comparing check-all and forcedchoice question formats in web surveys. Public Opinion Quarterly, 70(1):66-77.
  15. Strube, M. and Ponzetto, S. P. (2006). Wikirelate! computing semantic relatedness using wikipedia. In proceedings of the 21st national conference on Artificial intelligence - Volume 2, pages 1419-1424. AAAI Press.
Download


Paper Citation


in Harvard Style

Mathiak B., Manuel Martínez Peña V. and Wira-Alam A. (2012). WHAT IS THE RELATIONSHIP ABOUT? - Extracting Information about Relationships from Wikipedia . In Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8565-08-2, pages 625-632. DOI: 10.5220/0003936506250632


in Bibtex Style

@conference{webist12,
author={Brigitte Mathiak and Victor Manuel Martínez Peña and Andias Wira-Alam},
title={WHAT IS THE RELATIONSHIP ABOUT? - Extracting Information about Relationships from Wikipedia},
booktitle={Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2012},
pages={625-632},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003936506250632},
isbn={978-989-8565-08-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - WHAT IS THE RELATIONSHIP ABOUT? - Extracting Information about Relationships from Wikipedia
SN - 978-989-8565-08-2
AU - Mathiak B.
AU - Manuel Martínez Peña V.
AU - Wira-Alam A.
PY - 2012
SP - 625
EP - 632
DO - 10.5220/0003936506250632