Word Sense Discrimination on Tweets: A Graph-based Approach
Flavio Massimiliano Cecchini, Elisabetta Fersini, Enza Messina
2015
Abstract
In this paper we are going to detail an unsupervised, graph-based approach for word sense discrimination on tweets. We deal with this problem by constructing a word graph of co-occurrences. By defining a distance on this graph, we obtain a word metric space, on which we can apply an aggregative algorithm for word clustering. As a result, we will get word clusters representing contexts that discriminate the possible senses of a term. We present some experimental results both on a data set consisting of tweets we collected and on the data set of task 14 at SemEval-2010.
References
- Aigner, M. (2012). Combinatorial theory, volume 234. Springer Science & Business Media.
- Biemann, C. (2006). Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the first workshop on graph based methods for natural language processing, pages 73-80. Association for Computational Linguistics.
- Brody, S. and Lapata, M. (2009). Bayesian word sense induction. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 103-111. Association for Computational Linguistics.
- Dorow, B. and Widdows, D. (2003). Discovering corpusspecific word senses. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume 2, pages 79- 82. Association for Computational Linguistics.
- Fritsch, R. and Piccinini, R. (1990). Cellular structures in topology, volume 19. Cambridge University Press.
- Gärdenfors, P. (2004). Conceptual spaces: The geometry of thought. MIT press.
- Hope, D. and Keller, B. (2013). Maxmax: a graph-based soft clustering algorithm applied to word sense induction. In Computational Linguistics and Intelligent Text Processing, pages 368-381. Springer.
- i Cancho, R. F. and Solé, R. V. (2001). The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482):2261-2265.
- Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 2, pages 768-774. Association for Computational Linguistics.
- Manandhar, S., Klapaftis, I. P., Dligach, D., and Pradhan, S. S. (2010). Semeval-2010 task 14: Word sense induction & disambiguation. In Proceedings of the 5th international workshop on semantic evaluation, pages 63-68. Association for Computational Linguistics.
- Manning, C. D. and Sch ütze, H. (1999). Foundations of statistical natural language processing. MIT press.
- Mihalcea, R. and Faruque, E. (2004). Senselearner: Minimally supervised word sense disambiguation for all words in open text. In Proceedings of ACL/SIGLEX Senseval, volume 3, pages 155-158.
- Mucherino, A., Lavor, C., Liberti, L., and Maculan, N. (2012). Distance geometry: theory, methods, and applications. Springer Science & Business Media.
- Navigli, R. (2009). Word sense disambiguation: a survey. ACM Computing Surveys (CSUR), 41(2):10.
- Navigli, R. (2012). A quick tour of word sense disambiguation, induction and related approaches. In SOFSEM 2012: Theory and practice of computer science, pages 115-129. Springer.
- Navigli, R. and Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217-250.
- Owoputi, O., O'Connor, B., Dyer, C., Gimpel, K., Schneider, N., and Smith, N. A. (2013). Improved partof-speech tagging for online conversational text with word clusters. In HLT-NAACL, pages 380-390.
- Rudin, W. (1964). Principles of mathematical analysis, volume 3. McGraw-Hill New York.
- Schmitz, M., Bart, R., Soderland, S., Etzioni, O., et al. (2012). Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 523-534. Association for Computational Linguistics.
- Sch ütze, H. (1998). Automatic word sense discrimination. Computational linguistics, 24(1):97-123.
- Véronis, J. (2004). Hyperlex: lexical cartography for information retrieval. Computer Speech & Language, 18(3):223-252.
- Vinh, N. X., Epps, J., and Bailey, J. (2009). Information theoretic measures for clusterings comparison: is a correction for chance necessary? In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1073-1080. ACM.
- Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of small-worldnetworks. nature, 393(6684):440-442.
- Widdows, D. and Dorow, B. (2002). A graph model for unsupervised lexical acquisition. In Proceedings of the 19th international conference on Computational linguistics-Volume 1, pages 1-7. Association for Computational Linguistics.
- Zhong, Z. and Ng, H. T. (2010). It makes sense: A wide-coverage word sense disambiguation system for free text. In Proceedings of the ACL 2010 System Demonstrations, pages 78-83. Association for Computational Linguistics.
Paper Citation
in Harvard Style
Cecchini F., Fersini E. and Messina E. (2015). Word Sense Discrimination on Tweets: A Graph-based Approach . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 138-146. DOI: 10.5220/0005640501380146
in Bibtex Style
@conference{kdir15,
author={Flavio Massimiliano Cecchini and Elisabetta Fersini and Enza Messina},
title={Word Sense Discrimination on Tweets: A Graph-based Approach},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={138-146},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005640501380146},
isbn={978-989-758-158-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - Word Sense Discrimination on Tweets: A Graph-based Approach
SN - 978-989-758-158-8
AU - Cecchini F.
AU - Fersini E.
AU - Messina E.
PY - 2015
SP - 138
EP - 146
DO - 10.5220/0005640501380146