Word Sense Discrimination on Tweets: A Graph-based Approach

Flavio Massimiliano Cecchini, Elisabetta Fersini, Enza Messina

Abstract

In this paper we are going to detail an unsupervised, graph-based approach for word sense discrimination on tweets. We deal with this problem by constructing a word graph of co-occurrences. By defining a distance on this graph, we obtain a word metric space, on which we can apply an aggregative algorithm for word clustering. As a result, we will get word clusters representing contexts that discriminate the possible senses of a term. We present some experimental results both on a data set consisting of tweets we collected and on the data set of task 14 at SemEval-2010.

References

  1. Aigner, M. (2012). Combinatorial theory, volume 234. Springer Science & Business Media.
  2. Biemann, C. (2006). Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the first workshop on graph based methods for natural language processing, pages 73-80. Association for Computational Linguistics.
  3. Brody, S. and Lapata, M. (2009). Bayesian word sense induction. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 103-111. Association for Computational Linguistics.
  4. Dorow, B. and Widdows, D. (2003). Discovering corpusspecific word senses. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume 2, pages 79- 82. Association for Computational Linguistics.
  5. Fritsch, R. and Piccinini, R. (1990). Cellular structures in topology, volume 19. Cambridge University Press.
  6. Gärdenfors, P. (2004). Conceptual spaces: The geometry of thought. MIT press.
  7. Hope, D. and Keller, B. (2013). Maxmax: a graph-based soft clustering algorithm applied to word sense induction. In Computational Linguistics and Intelligent Text Processing, pages 368-381. Springer.
  8. i Cancho, R. F. and Solé, R. V. (2001). The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482):2261-2265.
  9. Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 2, pages 768-774. Association for Computational Linguistics.
  10. Manandhar, S., Klapaftis, I. P., Dligach, D., and Pradhan, S. S. (2010). Semeval-2010 task 14: Word sense induction & disambiguation. In Proceedings of the 5th international workshop on semantic evaluation, pages 63-68. Association for Computational Linguistics.
  11. Manning, C. D. and Sch ütze, H. (1999). Foundations of statistical natural language processing. MIT press.
  12. Mihalcea, R. and Faruque, E. (2004). Senselearner: Minimally supervised word sense disambiguation for all words in open text. In Proceedings of ACL/SIGLEX Senseval, volume 3, pages 155-158.
  13. Mucherino, A., Lavor, C., Liberti, L., and Maculan, N. (2012). Distance geometry: theory, methods, and applications. Springer Science & Business Media.
  14. Navigli, R. (2009). Word sense disambiguation: a survey. ACM Computing Surveys (CSUR), 41(2):10.
  15. Navigli, R. (2012). A quick tour of word sense disambiguation, induction and related approaches. In SOFSEM 2012: Theory and practice of computer science, pages 115-129. Springer.
  16. Navigli, R. and Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217-250.
  17. Owoputi, O., O'Connor, B., Dyer, C., Gimpel, K., Schneider, N., and Smith, N. A. (2013). Improved partof-speech tagging for online conversational text with word clusters. In HLT-NAACL, pages 380-390.
  18. Rudin, W. (1964). Principles of mathematical analysis, volume 3. McGraw-Hill New York.
  19. Schmitz, M., Bart, R., Soderland, S., Etzioni, O., et al. (2012). Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 523-534. Association for Computational Linguistics.
  20. Sch ütze, H. (1998). Automatic word sense discrimination. Computational linguistics, 24(1):97-123.
  21. Véronis, J. (2004). Hyperlex: lexical cartography for information retrieval. Computer Speech & Language, 18(3):223-252.
  22. Vinh, N. X., Epps, J., and Bailey, J. (2009). Information theoretic measures for clusterings comparison: is a correction for chance necessary? In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1073-1080. ACM.
  23. Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of small-worldnetworks. nature, 393(6684):440-442.
  24. Widdows, D. and Dorow, B. (2002). A graph model for unsupervised lexical acquisition. In Proceedings of the 19th international conference on Computational linguistics-Volume 1, pages 1-7. Association for Computational Linguistics.
  25. Zhong, Z. and Ng, H. T. (2010). It makes sense: A wide-coverage word sense disambiguation system for free text. In Proceedings of the ACL 2010 System Demonstrations, pages 78-83. Association for Computational Linguistics.
Download


Paper Citation


in Harvard Style

Cecchini F., Fersini E. and Messina E. (2015). Word Sense Discrimination on Tweets: A Graph-based Approach . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 138-146. DOI: 10.5220/0005640501380146


in Bibtex Style

@conference{kdir15,
author={Flavio Massimiliano Cecchini and Elisabetta Fersini and Enza Messina},
title={Word Sense Discrimination on Tweets: A Graph-based Approach},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={138-146},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005640501380146},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - Word Sense Discrimination on Tweets: A Graph-based Approach
SN - 978-989-758-158-8
AU - Cecchini F.
AU - Fersini E.
AU - Messina E.
PY - 2015
SP - 138
EP - 146
DO - 10.5220/0005640501380146