
 
 
gram graphs against dictionary techniques in 
capturing the expressed sentiment in a document and 
specifically in tweets. This outcome may be 
explained by the fact that twitter users use a 
significant number of abbreviations and internet 
slang terms in their posts. These terms are not 
included in any formal dictionary and this may be 
the reason that the classification process is extremely 
difficult using a pre-rated dictionary, even while 
using stemming methods. In essence the language 
used in Twitter comprises a whole new dialect, 
different from common English, and thus a different 
dictionary would be appropriate. The results also 
demonstrated the improvements that various 
combinations of NLP methods and machine learning 
algorithms can induce in the confidence rates of 
some sentiment analysis techniques. 
The innovation of this work is concentrated in 
the meticulous evaluation of the efficiency of 
various sentiment analysis mechanisms using 
manually annotated datasets, as well as in the 
demonstration of the possibility to combine 
methods, creating new techniques for enhancing the 
quality of the outcome. 
ACKNOWLEDGEMENTS 
This work has been supported by the Consensus 
project (http://www.consensus-project.eu) and has 
been partly funded by the EU Seventh Framework 
Programme, theme ICT-2013.5.4: ICT for 
Governance and Policy Modelling under Contract 
No. 611688. 
REFERENCES 
Agarwal, A., Xie, B., Vovsha, I., Rambow, O., 
Passonneau, R., 2011. Sentiment Analysis of Twitter 
Data, in: Proceedings of the Workshop on Languages 
in Social Media, LSM ’11. Association for 
Computational Linguistics, Stroudsburg, PA, USA, 
pp. 30–38. 
Aisopos, F., Papadakis, G., Tserpes, K., Varvarigou, T.A., 
2012. Content vs. context for sentiment analysis: a 
comparative analysis over microblogs, in: Munson, 
E.V., Strohmaier, M. (Eds.), 23rd ACM Conference 
on Hypertext and Social Media, HT ’12, Milwaukee, 
WI, USA, June 25-28, 2012. ACM, pp. 187–196. 
Baccianella, S., Esuli, A., Sebastiani, F., 2010. 
SentiWordNet 3.0: An Enhanced Lexical Resource for 
Sentiment Analysis and Opinion Mining. Presented at 
the Proceedings of the Seventh Conference on 
International Language Resources and Evaluation 
(LREC’10), European Language Resources 
Association (ELRA). 
Bing, L., 2011. Web Data Mining - Exploring Hyperlinks, 
Contents, and Usage Data, 2nd ed, Database 
Management & Information Retrieval. Springer. 
Cavnar, W.B., Trenkle, J.M., 1994. N-gram-based text 
categorization. Ann Arbor MI 48113, 161–175. 
Fürnkranz, J., 1998. A Study Using n-gram Features for 
Text Categorization (Technical Report OEFAI-TR-
9830). Austrian Institute for Artificial Intelligence. 
Giannakopoulos, G., Karkaletsis, V., Vouros, G.A., 
Stamatopoulos, P., 2008. Summarization system 
evaluation revisited: N-gram graphs. TSLP 5. 
Go, A., Bhayani, R., Huang, L., 2009. Twitter sentiment 
classification using distant supervision. CS224N Proj. 
Rep. Stanf. 1–12. 
Godbole, N., Srinivasaiah, M., Skiena, S., 2007. Large-
scale sentiment analysis for news and blogs. 
ICWSM’07. 
Gonçalves, P., Araújo, M., Benevenuto, F., Cha, M., 2013. 
Comparing and Combining Sentiment Analysis 
Methods, in: Proceedings of the First ACM 
Conference on Online Social Networks, COSN ’13. 
ACM, New York, NY, USA, pp. 27–38. 
doi:10.1145/2512938.2512951 
Haykin, S., 1994. Neural Networks: A Comprehensive 
Foundation. Macmillan College Publishing, New 
York. 
John, G.H., Langley, P., 1995. Estimating Continuous 
Distributions in Bayesian Classifiers, in: Proceedings 
of the Eleventh Conference on Uncertainty in 
Artificial Intelligence, UAI’95. Morgan Kaufmann 
Publishers Inc., San Francisco, CA, USA, pp. 338–
345. 
Mullen, T., Collier, N., 2004. Sentiment analysis using 
support vector machines with diverse information 
sources, in: In Proceedings of Conference on 
Empirical Methods in Natural Language Processing. 
Pak, A., Paroubek, P., 2010. Twitter as a Corpus for 
Sentiment Analysis and Opinion Mining, in: Calzolari, 
N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., 
Piperidis, S., Rosner, M., Tapias, D. (Eds.), 
Proceedings of the International Conference on 
Language Resources and Evaluation, LREC 2010, 17-
23 May 2010, Valletta, Malta. European Language 
Resources Association. 
Pang, B., Lee, L., 2008. Opinion Mining and Sentiment 
Analysis. Found. Trends® Inf. Retr. 2, 1–135. 
doi:10.1561/1500000011 
Pang, B., Lee, L., Vaithyanathan, S., 2002. Thumbs up? 
Sentiment Classification Using Machine Learning 
Techniques, in: emnlp2002. Philadelphia, 
Pennsylvania, pp. 79–86. 
Quinlan, J.R., 1996. Improved Use of Continuous 
Attributes in C4.5. J Artif Int Res 4, 77–90. 
Salazar, D.A., Vélez, J.I., Salazar, J.C., 2012. Comparison 
between SVM and Logistic Regression: Which One is 
Better to Discriminate? Rev. Colomb. Estad. 35, 223–
237. 
ComparingMethodsforTwitterSentimentAnalysis
231