Mining Tweet Data - Statistic and Semantic Information for Political Tweet Classification

Guillaume Tisserant, Mathieu Roche, Violaine Prince

Abstract

This paper deals with the quality of textual features in messages in order to classify tweets. The aim of our study is to show how improving the representation of textual data affects the performance of learning algorithms. We will first introduce our method YYYYY. It generalises less relevant words for tweet classification. Secondly we compare and discuss the types of textual features given by different approaches. More precisely we discuss the semantic specificity of textual features, e.g. Named Entity, HashTag.

References

  1. Béchet, N., Chauché, J., Prince, V., and Roche, M. (2014). How to combine text-mining methods to validate induced verb-object relations? Comput. Sci. Inf. Syst., 11(1):133-155.
  2. Chamberlain, J., Fort, K., Kruschwitz, U., Lafourcade, M., and Poesio, M. (2013). Using games to create language resources: Successes and limitations of the approach. In The Peoples Web Meets NLP, pages 3-44. Springer.
  3. Conover, M., Gonc¸alves, B., Ratkiewicz, J., Flammini, A., and Menczer, F. (2011). Predicting the political alignment of twitter users. In Proceedings of 3rd IEEE Conference on Social Computing (SocialCom).
  4. Costa, J., Silva, C., Antunes, M., and Ribeiro, B. (2013). Defining semantic meta-hashtags for twitter classification. In Adaptive and Natural Computing Algorithms, pages 226-235. Springer.
  5. Faure, D. and Nedellec, C. (1999). Knowledge acquisition of predicate argument structures from technical texts using machine learning: The system asium. In In Proceedings of EKAW, pages 329-334.
  6. Gamon, M. (2004). Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of COLING 7804.
  7. Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3:1157-1182.
  8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1):10-18.
  9. Hirano, T., Matsuo, Y., and Kikui, G. (2007). Detecting semantic relations between named entities in text using contextual features. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pages 157-160. Association for Computational Linguistics.
  10. Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28:11-21.
  11. Joshi, M. and Penstein-Rosé, C. (2009). Generalizing dependency features for opinion mining. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 313-316.
  12. Kywe, S. M., Hoang, T.-A., Lim, E.-P., and Zhu, F. (2012). On recommending hashtags in twitter networks. In Proceedings of the 4th International Conference on Social Informatics, SocInfo'12, pages 337- 350, Berlin, Heidelberg. Springer-Verlag.
  13. Luhn, H. P. (1957). A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev., 1(4):309-317.
  14. Mazzia, A. and Juett, J. (2011). Suggesting hashtags on twitter. In EECS 545 Project, Winter Term, 2011. URL http://www-personal.umich.edu/ amazzia/pubs/545- final.pdf.
  15. Ozdikis, O., Senkul, P., and Oguztuzun, H. (2012). Semantic expansion of hashtags for enhanced event detection in twitter. In Proceedings of the 1st International Workshop on Online Social Systems.
  16. Porter, M. (1980). An algorithm for suffix stripping. Program, 14(3):130-137.
  17. Salton, G. and McGill, M. J. (1986). Introduction to Modern Information Retrieval. McGraw-Hill, Inc.
  18. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., and Demirbas, M. (2010a). Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 841-842. ACM.
  19. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., and Demirbas, M. (2010b). Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 841-842. ACM.
  20. Tisserant, G., Roche, M., and Prince, V. (2013). Gendesc : Vers une nouvelle reprsentation des donnes textuelles. RNTI.
  21. Witten, I. H. and Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
Download


Paper Citation


in Harvard Style

Tisserant G., Roche M. and Prince V. (2014). Mining Tweet Data - Statistic and Semantic Information for Political Tweet Classification . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014) ISBN 978-989-758-048-2, pages 523-529. DOI: 10.5220/0005170205230529


in Bibtex Style

@conference{sstm14,
author={Guillaume Tisserant and Mathieu Roche and Violaine Prince},
title={Mining Tweet Data - Statistic and Semantic Information for Political Tweet Classification},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014)},
year={2014},
pages={523-529},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005170205230529},
isbn={978-989-758-048-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014)
TI - Mining Tweet Data - Statistic and Semantic Information for Political Tweet Classification
SN - 978-989-758-048-2
AU - Tisserant G.
AU - Roche M.
AU - Prince V.
PY - 2014
SP - 523
EP - 529
DO - 10.5220/0005170205230529