Authors:
Tatiana Ermakova
1
;
2
;
3
;
Max Henke
4
and
Benjamin Fabian
5
;
4
;
6
;
3
Affiliations:
1
Chair of Open Distributed Systems (ODS), Technical University of Berlin, Einsteinufer 25, 10587 Berlin, Germany
;
2
Competence Center of Electronic Safety and Security Systems for the Public and Industries (ESPRI), Fraunhofer Institute for Open Communication Systems (FOKUS), Kaiserin-Augusta-Allee 31, 10589 Berlin, Germany
;
3
Weizenbaum Institute for the Networked Society, Hardenbergstraße 32, 10623 Berlin, Germany
;
4
Hochschule für Telekommunikation Leipzig (HfTL), Gustav-Freytag-Straße 43-45, 04277 Leipzig, Germany
;
5
e-Government, Technical University of Applied Sciences Wildau (TH Wildau), Hochschulring 1, 15745 Wildau, Germany
;
6
Information Systems, Humboldt University of Berlin, Spandauer Str. 1, 10178 Berlin, Germany
Keyword(s):
Sentiment Analysis, Machine Learning, Text Classification, Commercial Service, SaaS, Cloud Computing.
Abstract:
Empirical insights into high-promising commercial sentiment analysis solutions that go beyond their vendors’ claims are rare. Moreover, due to ongoing advances in the field, earlier studies are far from reflecting the current situation due to the constant evolution of the field. The present research aims to evaluate and compare current solutions. Based on tweets on the airline service quality, we test the solutions of six vendors with different market power, such as Amazon, Google, IBM, Microsoft, and Lexalytics, and MeaningCloud, and report their measures of accuracy, precision, recall, (macro) F1, time performance, and service level agreements (SLA). For positive and neutral classifications, none of the solutions showed precision of over 70%. For negative classifications, all of them demonstrate high precision of around 90%, however, only IBM Watson NLU and Google Cloud Natural Language achieve recall of over 70% and thus can be seen as worth considering for application scenarios w
here negative text detection is a major concern. Overall, our study shows that an independent, critical experimental analysis of sentiment analysis services can provide interesting insights into their general reliability and particular classification accuracy beyond marketing claims to critically compare solutions based on real-world data and analyze potential weaknesses and margins of error before making an investment.
(More)