Comparing Methods for Twitter Sentiment Analysis

Evangelos Psomakelis, Konstantinos Tserpes, Dimosthenis Anagnostopoulos, Theodora Varvarigou

Abstract

This work extends the set of works which deal with the popular problem of sentiment analysis in Twitter. It investigates the most popular document ("tweet") representation methods which feed sentiment evaluation mechanisms. In particular, we study the bag-of-words, n-grams and n-gram graphs approaches and for each of them we evaluate the performance of a lexicon-based and 7 learning-based classification algorithms (namely SVM, Naïve Bayesian Networks, Logistic Regression, Multilayer Perceptrons, Best-First Trees, Functional Trees and C4.5) as well as their combinations, using a set of 4451 manually annotated tweets. The results demonstrate the superiority of learning-based methods and in particular of n-gram graphs approaches for predicting the sentiment of tweets. They also show that the combinatory approach has impressive effects on n-grams, raising the confidence up to 83.15% on the 5-Grams, using majority vote and a balanced dataset (equal number of positive, negative and neutral tweets for training). In the n-gram graph cases the improvement was small to none, reaching 94.52% on the 4-gram graphs, using Orthodromic distance and a threshold of 0.001.

References

  1. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R., 2011. Sentiment Analysis of Twitter Data, in: Proceedings of the Workshop on Languages in Social Media, LSM 7811. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 30-38.
  2. Aisopos, F., Papadakis, G., Tserpes, K., Varvarigou, T.A., 2012. Content vs. context for sentiment analysis: a comparative analysis over microblogs, in: Munson, E.V., Strohmaier, M. (Eds.), 23rd ACM Conference on Hypertext and Social Media, HT 7812, Milwaukee, WI, USA, June 25-28, 2012. ACM, pp. 187-196.
  3. Baccianella, S., Esuli, A., Sebastiani, F., 2010. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. Presented at the Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC'10), European Language Resources Association (ELRA).
  4. Bing, L., 2011. Web Data Mining - Exploring Hyperlinks, Contents, and Usage Data, 2nd ed, Database Management & Information Retrieval. Springer.
  5. Cavnar, W.B., Trenkle, J.M., 1994. N-gram-based text categorization. Ann Arbor MI 48113, 161-175.
  6. Fürnkranz, J., 1998. A Study Using n-gram Features for Text Categorization (Technical Report OEFAI-TR9830). Austrian Institute for Artificial Intelligence.
  7. Giannakopoulos, G., Karkaletsis, V., Vouros, G.A., Stamatopoulos, P., 2008. Summarization system evaluation revisited: N-gram graphs. TSLP 5.
  8. Go, A., Bhayani, R., Huang, L., 2009. Twitter sentiment classification using distant supervision. CS224N Proj. Rep. Stanf. 1-12.
  9. Godbole, N., Srinivasaiah, M., Skiena, S., 2007. Largescale sentiment analysis for news and blogs. ICWSM'07.
  10. Gonçalves, P., Araújo, M., Benevenuto, F., Cha, M., 2013. Comparing and Combining Sentiment Analysis Methods, in: Proceedings of the First ACM Conference on Online Social Networks, COSN 7813. ACM, New York, NY, USA, pp. 27-38. doi:10.1145/2512938.2512951
  11. Haykin, S., 1994. Neural Networks: A Comprehensive Foundation. Macmillan College Publishing, New York.
  12. John, G.H., Langley, P., 1995. Estimating Continuous Distributions in Bayesian Classifiers, in: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI'95. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 338- 345.
  13. Mullen, T., Collier, N., 2004. Sentiment analysis using support vector machines with diverse information sources, in: In Proceedings of Conference on Empirical Methods in Natural Language Processing.
  14. Pak, A., Paroubek, P., 2010. Twitter as a Corpus for Sentiment Analysis and Opinion Mining, in: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (Eds.), Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17- 23 May 2010, Valletta, Malta. European Language Resources Association.
  15. Pang, B., Lee, L., 2008. Opinion Mining and Sentiment Analysis. Found. Trends® Inf. Retr. 2, 1-135. doi:10.1561/1500000011
  16. Pang, B., Lee, L., Vaithyanathan, S., 2002. Thumbs up? Sentiment Classification Using Machine Learning Techniques, in: emnlp2002. Philadelphia, Pennsylvania, pp. 79-86.
  17. Quinlan, J.R., 1996. Improved Use of Continuous Attributes in C4.5. J Artif Int Res 4, 77-90.
  18. Salazar, D.A., Vélez, J.I., Salazar, J.C., 2012. Comparison between SVM and Logistic Regression: Which One is Better to Discriminate? Rev. Colomb. Estad. 35, 223- 237.
  19. Shi, H., 2007. Best-first decision tree learning. University of Waikato.
  20. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M., 2011. Lexicon-Based Methods for Sentiment Analysis. Comput. Linguist. 37, 267-307. doi:10.1162/COLI_a_00049
  21. Technology Blog [WWW Document], 2012. URL http://www.corequant.com/?p=1 (accessed 7.22.14).
  22. Tomer Hertz, 2006. Learning Distance Functions: Algorithms and Applications (PHD). THE HEBREW UNIVERSITY OF JERUSALEM.
  23. Turney, P., 2002. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. Philadelphia, Pennsylvania, pp. 417-424.
  24. Wilson, T., Wiebe, J., Hoffmann, P., 2005. Recognizing Contextual Polarity in Phrase-level Sentiment Analysis, in: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT 7805. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 347-354. doi:10.3115/1220575.1220619
Download


Paper Citation


in Harvard Style

Psomakelis E., Tserpes K., Anagnostopoulos D. and Varvarigou T. (2014). Comparing Methods for Twitter Sentiment Analysis . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 225-232. DOI: 10.5220/0005075302250232


in Bibtex Style

@conference{kdir14,
author={Evangelos Psomakelis and Konstantinos Tserpes and Dimosthenis Anagnostopoulos and Theodora Varvarigou},
title={Comparing Methods for Twitter Sentiment Analysis},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={225-232},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005075302250232},
isbn={978-989-758-048-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - Comparing Methods for Twitter Sentiment Analysis
SN - 978-989-758-048-2
AU - Psomakelis E.
AU - Tserpes K.
AU - Anagnostopoulos D.
AU - Varvarigou T.
PY - 2014
SP - 225
EP - 232
DO - 10.5220/0005075302250232