gram graphs against dictionary techniques in
capturing the expressed sentiment in a document and
specifically in tweets. This outcome may be
explained by the fact that twitter users use a
significant number of abbreviations and internet
slang terms in their posts. These terms are not
included in any formal dictionary and this may be
the reason that the classification process is extremely
difficult using a pre-rated dictionary, even while
using stemming methods. In essence the language
used in Twitter comprises a whole new dialect,
different from common English, and thus a different
dictionary would be appropriate. The results also
demonstrated the improvements that various
combinations of NLP methods and machine learning
algorithms can induce in the confidence rates of
some sentiment analysis techniques.
The innovation of this work is concentrated in
the meticulous evaluation of the efficiency of
various sentiment analysis mechanisms using
manually annotated datasets, as well as in the
demonstration of the possibility to combine
methods, creating new techniques for enhancing the
quality of the outcome.
ACKNOWLEDGEMENTS
This work has been supported by the Consensus
project (http://www.consensus-project.eu) and has
been partly funded by the EU Seventh Framework
Programme, theme ICT-2013.5.4: ICT for
Governance and Policy Modelling under Contract
No. 611688.
REFERENCES
Agarwal, A., Xie, B., Vovsha, I., Rambow, O.,
Passonneau, R., 2011. Sentiment Analysis of Twitter
Data, in: Proceedings of the Workshop on Languages
in Social Media, LSM ’11. Association for
Computational Linguistics, Stroudsburg, PA, USA,
pp. 30–38.
Aisopos, F., Papadakis, G., Tserpes, K., Varvarigou, T.A.,
2012. Content vs. context for sentiment analysis: a
comparative analysis over microblogs, in: Munson,
E.V., Strohmaier, M. (Eds.), 23rd ACM Conference
on Hypertext and Social Media, HT ’12, Milwaukee,
WI, USA, June 25-28, 2012. ACM, pp. 187–196.
Baccianella, S., Esuli, A., Sebastiani, F., 2010.
SentiWordNet 3.0: An Enhanced Lexical Resource for
Sentiment Analysis and Opinion Mining. Presented at
the Proceedings of the Seventh Conference on
International Language Resources and Evaluation
(LREC’10), European Language Resources
Association (ELRA).
Bing, L., 2011. Web Data Mining - Exploring Hyperlinks,
Contents, and Usage Data, 2nd ed, Database
Management & Information Retrieval. Springer.
Cavnar, W.B., Trenkle, J.M., 1994. N-gram-based text
categorization. Ann Arbor MI 48113, 161–175.
Fürnkranz, J., 1998. A Study Using n-gram Features for
Text Categorization (Technical Report OEFAI-TR-
9830). Austrian Institute for Artificial Intelligence.
Giannakopoulos, G., Karkaletsis, V., Vouros, G.A.,
Stamatopoulos, P., 2008. Summarization system
evaluation revisited: N-gram graphs. TSLP 5.
Go, A., Bhayani, R., Huang, L., 2009. Twitter sentiment
classification using distant supervision. CS224N Proj.
Rep. Stanf. 1–12.
Godbole, N., Srinivasaiah, M., Skiena, S., 2007. Large-
scale sentiment analysis for news and blogs.
ICWSM’07.
Gonçalves, P., Araújo, M., Benevenuto, F., Cha, M., 2013.
Comparing and Combining Sentiment Analysis
Methods, in: Proceedings of the First ACM
Conference on Online Social Networks, COSN ’13.
ACM, New York, NY, USA, pp. 27–38.
doi:10.1145/2512938.2512951
Haykin, S., 1994. Neural Networks: A Comprehensive
Foundation. Macmillan College Publishing, New
York.
John, G.H., Langley, P., 1995. Estimating Continuous
Distributions in Bayesian Classifiers, in: Proceedings
of the Eleventh Conference on Uncertainty in
Artificial Intelligence, UAI’95. Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, pp. 338–
345.
Mullen, T., Collier, N., 2004. Sentiment analysis using
support vector machines with diverse information
sources, in: In Proceedings of Conference on
Empirical Methods in Natural Language Processing.
Pak, A., Paroubek, P., 2010. Twitter as a Corpus for
Sentiment Analysis and Opinion Mining, in: Calzolari,
N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J.,
Piperidis, S., Rosner, M., Tapias, D. (Eds.),
Proceedings of the International Conference on
Language Resources and Evaluation, LREC 2010, 17-
23 May 2010, Valletta, Malta. European Language
Resources Association.
Pang, B., Lee, L., 2008. Opinion Mining and Sentiment
Analysis. Found. Trends® Inf. Retr. 2, 1–135.
doi:10.1561/1500000011
Pang, B., Lee, L., Vaithyanathan, S., 2002. Thumbs up?
Sentiment Classification Using Machine Learning
Techniques, in: emnlp2002. Philadelphia,
Pennsylvania, pp. 79–86.
Quinlan, J.R., 1996. Improved Use of Continuous
Attributes in C4.5. J Artif Int Res 4, 77–90.
Salazar, D.A., Vélez, J.I., Salazar, J.C., 2012. Comparison
between SVM and Logistic Regression: Which One is
Better to Discriminate? Rev. Colomb. Estad. 35, 223–
237.
ComparingMethodsforTwitterSentimentAnalysis
231