Unsupervised Irony Detection: A Probabilistic Model with Word Embeddings

Debora Nozza, Elisabetta Fersini, Enza Messina

Abstract

The automatic detection of figurative language, such as irony and sarcasm, is one of the most challenging tasks of Natural Language Processing (NLP). This is because machine learning methods can be easily misled by the presence of words that have a strong polarity but are used ironically, which means that the opposite polarity was intended. In this paper, we propose an unsupervised framework for domain-independent irony detection. In particular, to derive an unsupervised Topic-Irony Model (TIM), we built upon an existing probabilistic topic model initially introduced for sentiment analysis purposes. Moreover, in order to improve its generalization abilities, we took advantage of Word Embeddings to obtain domain-aware ironic orientation of words. This is the first work that addresses this task in unsupervised settings and the first study on the topic-irony distribution. Experimental results have shown that TIM is comparable, and sometimes even better with respect to supervised state of the art approaches for irony detection. Moreover, when integrating the probabilistic model with word embeddings (TIM+WE), promising results have been obtained in a more complex and real world scenario.

References

  1. Bamman, D. and Smith, N. A. (2015). Contextualized sarcasm detection on twitter. In Proceedings of the 9th International AAAI Conference on Web and Social Media, pages 574-77.
  2. Barbieri, F. and Saggion, H. (2014). Modelling irony in twitter. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 56-64.
  3. Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., and Gauvain, J.-L. (2006). Neural probabilistic language models. In Innovations in Machine Learning: Theory and Applications, pages 137-186. Springer.
  4. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993-1022.
  5. Blitzer, J., Dredze, M., and Pereira, F. (2007). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Association for Computational Linguistics, volume 7, pages 440-447.
  6. Bosco, C., Patti, V., and Bolioli, A. (2013). Developing corpora for sentiment analysis: The case of irony and senti-tut. IEEE Intelligent Systems, 28(2):55-63.
  7. Colston, H. and Gibbs, R. (2007). A brief history of irony. In Irony in language and thought: A cognitive science reader, pages 3-21. Lawrence Erlbaum Assoc Incorporated.
  8. Davidov, D., Tsur, O., and Rappoport, A. (2010). Semisupervised recognition of sarcastic sentences in twitter and amazon. In Proceedings of the 14th Conference on Computational Natural Language Learning, pages 107-116. Association for Computational Linguistics.
  9. Edward, P. C. and Connors, R. (1971). Classical rhetoric for the modern student.
  10. Esuli, A. and Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of the 5th Conference on Language Resources and Evaluation, volume 6, pages 417-422. Citeseer.
  11. Fersini, E., Pozzi, F. A., and Messina, E. (2015). Detecting irony and sarcasm in microblogs: The role of expressive signals and ensemble classifiers. InProceedings of IEEE International Conference on Data Science and Advanced Analytics, pages 1-8. IEEE.
  12. Ghosh, A., Li, G., Veale, T., Rosso, P., Shutova, E., Barnden, J., and Reyes, A. (2015a). Semeval-2015 task 11: Sentiment analysis of figurative language in twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation, pages 470-478.
  13. Ghosh, D., Guo, W., and Muresan, S. (2015b). Sarcastic or not: Word embeddings to predict the literal or sarcastic meaning of words. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1003-1012.
  14. González-Ibánez, R., Muresan, S., and Wacholder, N. (2011). Identifying sarcasm in twitter: a closer look. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, pages 581-586. Association for Computational Linguistics.
  15. Hernández-Farías, I., Bened í, J.-M., and Rosso, P. (2015). Applying basic features from sentiment analysis for automatic irony detection. In Pattern Recognition and Image Analysis, pages 337-344. Springer.
  16. Huang, E. H., Socher, R., Manning, C. D., and Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 873-882. Association for Computational Linguistics.
  17. Jijkoun, V., de Rijke, M., and Weerkamp, W. (2010). Generating focused topic-specific sentiment lexicons. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 585-594. Association for Computational Linguistics.
  18. Jo, Y. and Oh, A. H. (2011). Aspect and sentiment unification model for online review analysis. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 7811, pages 815-824, New York, NY, USA. ACM.
  19. Kaji, N. and Kitsuregawa, M. (2007). Building lexicon for sentiment analysis from massive collection of html documents. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1075-1083, Prague, Czech Republic. Association for Computational Linguistics.
  20. Katz, A. N., Colston, H., and Katz, A. (2005). Discourse and sociocultural factors in understanding nonliteral language. In Figurative language comprehension: Social and cultural influences , pages 183-207. Lawrence Erlbaum Associates, Inc. Mahwah, NJ.
  21. Lin, C. and He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, pages 375-384. ACM.
  22. Lu, Y., Castellanos, M., Dayal, U., and Zhai, C. (2011). Automatic construction of a context-aware sentiment lexicon: An optimization approach. In Proceedings of the 20th International Conference on World Wide Web, pages 347-356. ACM.
  23. McDonald, S. (1999). Exploring the process of inference generation in sarcasm: A review of normal and clinical studies. Brain and Language, 68(3):486-506.
  24. Mei, Q., Ling, X., Wondra, M., Su, H., and Zhai, C. (2007). Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th International Conference on World Wide Web, pages 171- 180. ACM.
  25. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR, abs/1301.3:1-12.
  26. Mohammad, S., Dunne, C., and Dorr, B. (2009). Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2-Volume 2, pages 599-608. Association for Computational Linguistics.
  27. Pedersen, T., Patwardhan, S., and Michelizzi, J. (2004). Wordnet::similarity: Measuring the relatedness of concepts. In Demonstration Papers at HLT-NAACL 2004, HLT-NAACL-Demonstrations 7804, pages 38- 41, Stroudsburg, PA, USA. Association for Computational Linguistics.
  28. Ptác?ek, T., Habernal, I., and Hong, J. (2014). Sarcasm detection on czech and english twitter. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, pages 213- 223, Dublin, Ireland. Dublin City University and Association for Computational Linguistics.
  29. Rajadesingan, A., Zafarani, R., and Liu, H. (2015). Sarcasm detection on twitter: A behavioral modeling approach. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining, pages 97-106. ACM.
  30. Rao, D. and Ravichandran, D. (2009). Semi-supervised polarity lexicon induction. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 675-682. Association for Computational Linguistics.
  31. Reyes, A. and Rosso, P. (2014). On the difficulty of automatically detecting irony: beyond a simple case of negation. Knowledge and Information Systems, 40(3):595-614.
  32. Reyes, A., Rosso, P., and Veale, T. (2013). A multidimensional approach for detecting irony in twitter. Language resources and evaluation, 47(1):239-268.
  33. Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N., and Huang, R. (2013). Sarcasm as contrast between a positive sentiment and negative situation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 704-714. Association for Computational Linguistics.
  34. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323:533-536.
  35. Turian, J., Ratinov, L., and Bengio, Y. (2010). Word representations: a simple and general method for semisupervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384-394. Association for Computational Linguistics.
  36. Weitzel, L., Prati, R. C., and Aguiar, R. F. (2016). The Comprehension of Figurative Language: What Is the Influence of Irony and Sarcasm on NLP Techniques? , pages 49-74. Springer International Publishing.
Download


Paper Citation


in Harvard Style

Nozza D., Fersini E. and Messina E. (2016). Unsupervised Irony Detection: A Probabilistic Model with Word Embeddings . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 68-76. DOI: 10.5220/0006052000680076


in Bibtex Style

@conference{kdir16,
author={Debora Nozza and Elisabetta Fersini and Enza Messina},
title={Unsupervised Irony Detection: A Probabilistic Model with Word Embeddings},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},
year={2016},
pages={68-76},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006052000680076},
isbn={978-989-758-203-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - Unsupervised Irony Detection: A Probabilistic Model with Word Embeddings
SN - 978-989-758-203-5
AU - Nozza D.
AU - Fersini E.
AU - Messina E.
PY - 2016
SP - 68
EP - 76
DO - 10.5220/0006052000680076