SENTIMENT ANALYSIS RELOADED - A Comparative Study on Sentiment Polarity Identification Combining Machine Learning and Subjectivity Features

Ulli Waltinger

2010

Abstract

This paper presents an empirical study on machine learning-based sentiment analysis. Though polarity classification has been extensively studied at different document-structure levels (e.g. document, sentence, words), little work has been done investigating feature selection methods and subjectivity resources. We systematically analyze four different English subjectivity resources for the task of sentiment polarity identification. While the results show that the size of dictionaries clearly correlate to polarity-based feature coverage, this property does not correlate to classification accuracy. Using polarity-based feature selection, considering a minimum amount of prior polarity features, in combination with SVM-based machine learning methods exhibits the best performance (acc=84.1, f1=83.9), in comparison to the classical approaches on polarity identification. Based on the findings of the English-based experimental setup, a new German subjectivity resource is proposed for the task of German-based sentiment analysis. The results of the experiments show, with f1=85.9 its good adaptability to the new domain.

References

  1. Agarwal, A., Biadsy, F., and McKeown, K. (2009). Contextual phrase-level polarity analysis using lexical affect scoring and syntactic n-grams. In EACL2009, Athens, Greece.
  2. Annett, M. and Kondrak, G. (2008). A comparison of sentiment analysis techniques: Polarizing movie blogs. In Canadian Conference on AI, pages 25-35.
  3. Chandler, D. (1987). Introduction to Modern Statistical Mechanics. Oxford University Press.
  4. Chaovalit, P. and Zhou, L. (2005). Movie review mining: a comparison between supervised and unsupervised classification approaches. Hawaii International Conference on System Sciences, 4:112c.
  5. Dave, K., Lawrence, S., and Pennock, D. M. (2003). Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In WWW 7803: Proceedings of the twelfth international conference on World Wide Web, pages 519-528. ACM Press.
  6. Esuli, A. and Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. In In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC06, pages 417-422.
  7. Fellbaum, C., editor (1998). WordNet. An Electronic Lexical Database. The MIT Press.
  8. Hatzivassiloglou, V. and McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, pages 174-181, Morristown, NJ, USA. Association for Computational Linguistics.
  9. Kennedy, A. and Inkpen, D. (2006). Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence, 22(2):110-125.
  10. Kugatsu Sadamitsu, S. S. and Yamamoto, M. (2008). Sentiment analysis based on probabilistic models using inter-sentence information. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, B. M. J. M. J. O. S. P. D. T., editor, Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
  11. Liu, B. (2010). Sentiment analysis and subjectivity. Handbook of Natural Language Processing, 2:568.
  12. Maarten, J. K., Marx, M., Mokken, R. J., and Rijke, M. D. (2004). Using wordnet to measure semantic orientations of adjectives. In National Institute for, pages 1115-1118.
  13. Mehler, A., Geibel, P., and Pustylnikov, O. (2007). Structural classifiers of text types: Towards a novel model of text representation. Journal for Language Technology and Computational Linguistics (JLCL), 22(2):51- 66.
  14. Mullen, T. and Collier, N. (2004). Sentiment analysis using support vector machines with diverse information sources. In Lin, D. and Wu, D., editors, Proceedings of EMNLP 2004, pages 412-418, Barcelona, Spain. Association for Computational Linguistics.
  15. Pang and Lee (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In In Proceedings of the ACL, pages 271-278.
  16. Pang, B. and Lee, L. (2005). Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In ACL 7805: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 115-124, Morristown, NJ, USA. Association for Computational Linguistics.
  17. Pang, B. and Lee, L. (2008). Opinion Mining and Sentiment Analysis. Now Publishers Inc.
  18. Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. In EMNLP 7802: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, pages 79-86, Morristown, NJ, USA. Association for Computational Linguistics.
  19. Prabowo, R. and Thelwall, M. (2009). Sentiment analysis: A combined approach. J. Informetrics, 3(2):143-157.
  20. Stone, P. J., Dunphy, D. C., Smith, M. S., and Ogilvie, D. M. (1966). The General Inquirer: A Computer Approach to Content Analysis. MIT Press.
  21. Strapparava, C. and Valitutti, A. (2004). WordNet-Affect: an affective extension of WordNet. In Proceedings of LREC, volume 4, pages 1083-1086.
  22. Taboada, M., Brooke, J., and Stede, M. (2009). Genrebased paragraph classification for sentiment analysis. In Proceedings of the SIGDIAL 2009 Conference, pages 62-70, London, UK. Association for Computational Linguistics.
  23. Takamura, H., Inui, T., and Okumura, M. (2005). Extracting semantic orientations of words using spin model. In ACL 7805: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 133-140, Morristown, NJ, USA. Association for Computational Linguistics.
  24. Tan, S. and Zhang, J. (2008). An empirical study of sentiment analysis for chinese documents. Expert Syst. Appl., 34(4):2622-2629.
  25. Turney, P. D. (2001). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In ACL 7802: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 417-424, Morristown, NJ, USA. Association for Computational Linguistics.
  26. Turney, P. D. and Littman, M. L. (2002). Unsupervised learning of semantic orientation from a hundredbillion-word corpus. CoRR, cs.LG/0212012.
  27. Waltinger, U. (2009). Polarity reinforcement: Sentiment polarity identification by means of social semantics. In Proceedings of the IEEE Africon 2009, September 23-25, Nairobi, Kenya.
  28. Wiebe, J. and Riloff, E. (2005). Creating subjective and objective sentence classifiers from unannotated texts. In Proceeding of CICLing-05, International Conference on Intelligent Text Processing and Computational Linguistics., volume 3406 of Lecture Notes in Computer Science, pages 475-486, Mexico City, MX. SpringerVerlag.
  29. Wiebe, J., Wilson, T., and Cardie, C. (2005). Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 1(2):0.
  30. Wilson, T., Wiebe, J., and Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In HLT 7805: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 347-354, Morristown, NJ, USA. Association for Computational Linguistics.
  31. Yu, H. and Hatzivassiloglou, V. (2003). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of EMNLP'03.
Download


Paper Citation


in Harvard Style

Waltinger U. (2010). SENTIMENT ANALYSIS RELOADED - A Comparative Study on Sentiment Polarity Identification Combining Machine Learning and Subjectivity Features . In Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST, ISBN 978-989-674-025-2, pages 203-210. DOI: 10.5220/0002772602030210


in Bibtex Style

@conference{webist10,
author={Ulli Waltinger},
title={SENTIMENT ANALYSIS RELOADED - A Comparative Study on Sentiment Polarity Identification Combining Machine Learning and Subjectivity Features},
booktitle={Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,},
year={2010},
pages={203-210},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002772602030210},
isbn={978-989-674-025-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,
TI - SENTIMENT ANALYSIS RELOADED - A Comparative Study on Sentiment Polarity Identification Combining Machine Learning and Subjectivity Features
SN - 978-989-674-025-2
AU - Waltinger U.
PY - 2010
SP - 203
EP - 210
DO - 10.5220/0002772602030210