Authors:
Márcio Guia
1
;
Rodrigo Rocha Silva
2
and
Jorge Bernardino
3
Affiliations:
1
Polytechnic of Coimbra – ISEC, Rua Pedro Nunes, Quinta da Nora, 3030-199 Coimbra and Portugal
;
2
CISUC – Centre of Informatics and Systems of University of Coimbra, Pinhal de Marrocos, 3030-290 Coimbra, Portugal, FATEC Mogi das Cruzes, São Paulo Technological College, 08773-600 Mogi das Cruzes and Brazil
;
3
Polytechnic of Coimbra – ISEC, Rua Pedro Nunes, Quinta da Nora, 3030-199 Coimbra, Portugal, CISUC – Centre of Informatics and Systems of University of Coimbra, Pinhal de Marrocos, 3030-290 Coimbra and Portugal
Keyword(s):
Data Mining, Sentiment Analysis, Text Classification, Naïve Bayes, Support Vector Machine, Random Forest, Decision Trees.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Data Mining in Electronic Commerce
;
Evolutionary Computing
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Process Mining
;
Soft Computing
;
Symbolic Systems
Abstract:
Every day, we deal with a lot of information on the Internet. This information can have origin from many different places such as online review sites and social networks. In the midst of this messy data, arises the opportunity to understand the subjective opinion about a text, in particular, the polarity. Sentiment Analysis and Text Classification helps to extract precious information about data and assigning a text into one or more target categories according to its content. This paper proposes a comparison between four of the most popular Text Classification Algorithms - Naive Bayes, Support Vector Machine, Decision Trees and Random Forest - based on the Amazon Unlocked mobile phone reviews dataset. Moreover, we also study the impact of some attributes (Brand and Price) on the polarity of the review. Our results demonstrate that the Support Vector Machine is the most complete algorithm of this study and achieve the highest values in all the metrics such as accuracy, precision, reca
ll, and F1 score.
(More)