also proposed several effectiveness indicators that can
be computed efficiently from given datasets.
The main contributions of our work include:
1. This is the first formal and comprehensive study
known to us that analyzes the appropriateness of
using sentiment analysis on diverse data sets. Our
results can shed lights on the limitations of using
sentiment analysis in assessing public opinions.
Our results can also help raise the awareness of
the potential pitfalls associated with the misuse of
sentiment analysis.
2. We also propose a diverse set of effectiveness indi-
cators that can be computed efficiently from given
datasets to help people determine the appropriate-
ness of using a sentiment analysis tool.
In the following, we first review the current sen-
timent analysis methods and their applications, fol-
lowed by a description of our datasets and the anal-
yses we performed to assess the effectiveness of ap-
plying sentiment analysis on these datasets. Then we
explain our effort in developing a few effectiveness
indicators to help users determine whether a SA tool
is appropriate for a given dataset. Finally, we con-
clude the paper by summarizing the main findings and
pointing out a few future directions.
2 RELATED WORKS
Sentiment Analysis, also called opinion mining fre-
quently, in a broad sense is defined as the computa-
tional study of opinions, sentiments and emotions ex-
pressed in text (Pang and Lee, 2008). According to
(Liu, 2012), the task of sentiment analysis is to auto-
matically extract a quintuple from text:
(e
i
,a
i j
,s
i jkl
,h
k
,t
l
),
where e
i
is a target object, a
i j
is an aspect or attribute
of e
i
,s
i jkl
is the sentiment value of aspect a
i j
of entity
e
i
, h
k
is the opinion holder, and t
l
is the time when
an opinion is expressed by a opinion holder. Once
the sentiment quintuples are extracted from text, they
can be aggregated and analyzed qualitatively or quan-
titatively to derive insights. Extracting the quintuples
from unstructured text however is very challenging
due to the complexity in natural language process-
ing (NLP). For example, a positive or negative sen-
timent word may have opposite orientations in dif-
ferent application domains; Sarcasm is hard to de-
tect; Coreference resolution, negation handling, and
word sense disambiguation, a few well known but un-
solved problems in NLP are need for correct infer-
ence. Since many of the existing sentiment analysis
tools did not solve these problems appropriately, they
may work well in simple domains but not effective for
more complex applications.
In terms of the methods used in typical sentiment
analysis systems, they can be divided into lexicon-
based and machine learning-based approaches (May-
nard and Funk, 2012). Since a purely lexicon-based
approach is less common these days, here we focus on
machine learning-based methods. Frequently, a ma-
chine learning-based system also incorporates lexical
features from sentiment lexicons in its analysis.
Machine learning-based sentiment analysis can
be further divided into supervised and unsupervised
learning methods. The supervised methods make use
of a large number of annotated training examples to
build a sentiment classification model. Typical classi-
fication methods include Naive Bayes, maximum en-
tropy classifiers and support vector machines (Pang
et al., 2002). In general, for supervised sentiment
analysis, if the target domain is similar to the source
domain from which the training examples are col-
lected, the prediction accuracy will be similar to the
specified performance. In contrast, if the target do-
main is very different from the source domain, the
sentiment analysis performance can deteriorate sig-
nificantly. Among existing supervised sentiment anal-
ysis tools, some provide pre-trained models such as
the Mashape Text-Processing API
1
, others require
users to provide labeled data and then train their own
prediction models, such as Google Prediction API
2
,
NLTK text classification API
3
.
Since annotating a large number of examples with
sentiment labels can be very time consuming, there
are also many unsupervised sentiment analysis sys-
tems that do not require annotated training data.
They often rely on opinion bearing words to per-
form sentiment analysis (Andreevskaia and Bergler,
2006; Wei Peng, 2011). (Turney, 2002) proposed
a method that classifies reviews by using two arbi-
trary seed words – poor and excellent, to calculate
the semantic orientations of other words and phrases.
Read (Read and Carroll, 2009) proposed a weakly-
supervised technique, using a large collection of un-
labeled text to determine sentiment. They used PMI
(Turney, 2002), semantic spaces, and distributional
similarity to measure similarity between words and
polarity prototype. The results were less dependent
on the domain, topic and time-period represented by
the testing data. In addition, Hu (Hu et al., 2013) in-
vestigated whether models of emotion signals can po-
tentially help sentiment analysis.
1
http://text-processing.com/docs/sentiment.html
2
https://cloud.google.com/prediction/docs
3
http://www.nltk.org/api/nltk.classify.html
WEBIST 2016 - 12th International Conference on Web Information Systems and Technologies
54