Lexicon Expansion System for Domain and Time Oriented Sentiment

Analysis

Nuno Guimaraes

, Luis Torgo

and Alvaro Figueira

CRACS - INESC TEC, Rua do Campo Alegre 1021/1055, Porto, Portugal

LIADD - INESC TEC, Rua do Campo Alegre, 1021/1055, Porto, Portugal

Keywords:

Sentiment Lexicon, Lexicon Expansion, Twitter Sentiment Analysis.

Abstract:

In sentiment analysis the polarity of a text is often assessed recurring to sentiment lexicons, which usually

consist of verbs and adjectives with an associated positive or negative value. However, in short informal texts

like tweets or web comments, the absence of such words does not necessarily indicates that the text lacks

opinion. Tweets like ”First Paris, now Brussels... What can we do?” imply opinion in spite of not using words

present in sentiment lexicons, but rather due to the general sentiment or public opinion associated with terms

in a speciﬁc time and domain. In order to complement general sentiment dictionaries with those domain and

time speciﬁc terms, we propose a novel system for lexicon expansion that automatically extracts the more

relevant and up to date terms on several different domains and then assesses their sentiment through Twitter.

Experimental results on our system show an 82% accuracy on extracting domain and time speciﬁc terms and

80% on correct polarity assessment. The achieved results provide evidence that our lexicon expansion system

can extract and determined the sentiment of terms for domain and time speciﬁc corpora in a fully automatic

form.

1 INTRODUCTION

With the massive growth of Social Web, opinion data

became much more accessible and in larger quanti-

ties. The use of social networks like Twitter or Face-

book and the way users share their feelings regarding

politicians, products, events, companies, and celebri-

ties, through their personal proﬁle has motivated the

interest for further investigation on methods to auto-

matically classify the associated sentiment.

Supervised and unsupervised approaches have

been proposed in sentiment analysis classiﬁcation and

the inclusion of a sentiment lexicon is a common ap-

proach on both. These lexicons are mainly built using

verbs and adjectives since they are the more common

indicators of subjectivity. Although this may work

relatively well in medium and large texts (e.g. reviews

or articles), in small texts like tweets or comments, the

task becomes more difﬁcult due to their short length

format (Kiritchenko et al., 2014). Small texts may

not include any of the words in the sentiment lexicons

and still express a sentiment. Tweets like ”Listening

to Bowie. Still can’t believe it” do not include any

opinion words but have a sentiment associated which

is perceptible to who is aware of the death of the artist

David Bowie. Furthermore, tweets with a sense of

irony can also be misinterpreted by general sentiment

lexicons. For example in the following tweet ”I used

to think that Britain produced best comedy programs

but where else but here could we watch a team like

Sarah Palin and Donald Trump on TV?” words like

”best” could lead to a positive sentiment classiﬁca-

tion. However, the tweet is pointing to an overall neg-

ative sentiment ”disguised” with irony. Our ability to

detect the sentiment in both cases is due to: 1) the

knowledge of events and persons which is achieved

from news (seen on TV, newspapers and Internet) and

2) the knowledge on the public opinion and reactions

to those news. However, this is a feature that current

state of the art sentiment analysis methods do not con-

sider when assessing the polarity of a text.

News have an important role in today’s society.

Up to date information of events in several different

domains keep people aware of what is going on in the

world. That awareness has grown with the rise of the

World Wide Web since news have become much more

accessible and in greater quantity. Furthermore, news

are usually classiﬁed as relevant information and may

transmit a change of opinion on certain entities or

events. For example if an article is released on a ma-

Guimaraes, N., Torgo, L. and Figueira, A.

Lexicon Expansion System for Domain and Time Oriented Sentiment Analysis.

DOI: 10.5220/0006081704630471

In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016) - Volume 1: KDIR, pages 463-471

ISBN: 978-989-758-203-5

463

jor politician caught in a money laundering scheme,

the public opinion on that person may change. Yet,

there are also some cases where the opinion does not

shift (e.g. advances in the cure for Alzheimer may not

reverse the sentiment on the term ”Alzheimer”).

News headlines due to their short format, ap-

pear to be good sources for relevant terms extraction.

However, the sentiment transmitted in them may not

be the same as the sentiment from public opinion. To

assess the public opinion on news, Twitter makes a

good data source, since it includes millions of users

from famous people to companies and presidents. The

number of tweets and active users is also a factor.

Since June 2015, on average, 500 million tweets are

sent per day. The micro blogging site has also ap-

proximately 316 million users active per month (Twit-

ter, 2015a). Moreover, Twitter provides a public API

allowing the retrieval of tweets, getting user infor-

mation and monitoring tweets in real time making it

straightforward to retrieve large quantities of data for

analysis (Twitter, 2015b). Summarising, Twitter is an

updated and diversiﬁed source of information since

millions of tweets are posted on a daily basis about

different subjects from users with different opinions.

For this reason, we could use Twitter trending

terms to build our sentiment lexicon. However, they

do not always represent global relevance and are nor-

mally very speciﬁc to that social network. On the

other hand, analysing directly headlines (or the full

news article) may not provide a accurate sentiment on

the terms it mentions.

Taking these facts into account, we developed a

system for sentiment lexicon expansion that combines

both. First, determines relevant and up to date terms

from news headlines and then, for each term, it uses

Twitter to determine the current public sentiment on

it. Our main goals for this work are: 1) to assess

the reliability in extracting domain and time speciﬁc

terms for our lexicon expansion method using solely

news headlines and 2) if the polarity assigned by the

sample of tweets containing the terms corresponds to

the polarity of the terms.

The rest of the article is organized as follows.

First, we describe the state of the art on the subject.

Next, we specify the workﬂow of our proposal. Then,

we present the experimental evaluation of our sys-

tem. Finally, we describe some conclusions and fu-

ture work.

2 RELATED WORK

One of the most important parts for achieving high

accuracy on sentiment analysis are ”sentiment lexi-

cons” (or sentiment dictionaries). Each of the words

in these lexicons can have a binary (positive and nega-

tive), ternary (positive, neutral, negative) or numerical

(e.g a -5 to 5 interval) sentiment value. Some studies

also evaluate sentiment as emotions like fear, joy and

sadness (Mohammad and Turney, 2010).

There are three main groups where sentiment

lexicons creation methods can be included. The

ﬁrst is manual labelling where one or several vol-

unteers/workers label a list of words with sentiment

and then, use metrics to determine inter-worker agree-

ment (Mohammad and Turney, 2010; Taboada et al.,

2011; Hutto and Gilbert, 2014; Nielsen, 2011). How-

ever, this approach can be time consuming, increas-

ing with the size of the word list and the number

of different evaluations required for each word. It

can also be expensive if we resort to services like

Mechanical Turk (Amazon, 2016) or CrowdFlower

(CrowdFlower, 2016) where a fee must be paid to

each worker who completes the classiﬁcation task.

Therefore, more automatic ways of creating senti-

ment lexicons were proposed. These require a small

sample of sentiment labelled terms, normally named

”seed words”, and then expanding the lexicon us-

ing those words as base. Two different approaches

have been used for expanding the lexicon in semi-

supervised fashion: thesaurus based approaches and

corpus based approaches.

Thesaurus based approaches rely on other syntac-

tic resources like the General Inquirer (GI) (Stone

et al., 1966) or WordNet (Fellbaum, 1998). Word-

Net is a large lexical resource containing noun, verbs,

adverbs and adjectives grouped by synsets which are

sets of cognitive synonyms. If the word is an adjec-

tive, a set of antonyms is also available. Some works

like SentiWordNet used this features and a small num-

ber of labelled words to expand sentiment lexicons by

assigning the same polarity of a word to its synonyms

and opposite polarity to antonyms (Baccianella et al.,

2010; Esuli and Sebastiani, 2006). However, the au-

thors in (Mohammad et al., 2009) present better sen-

timent accuracy in words than SentiWordNet1.0 by

using a Roget-like thesaurus. Several studies (Kim

and Hovy, 2004; Hu and Liu, 2004a) also used Word-

Net to expand sentiment lexicon, making it one of the

most used resources for lexicon expansion. One of the

major problems on this thesaurus based approaches

is the domain speciﬁc context on each opinion word.

The word ”loud” can have a negative orientation in

a car review but positive sentiment in a speaker re-

view. For more domain speciﬁc lexicon expansion,

the corpus-based approaches are a better solution.

In (Hatzivassiloglou and McKeown, 1997) a cor-

pus based lexicon expansion method is proposed us-

KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval

464

ing conjunction rules to infer new opinion words spe-

ciﬁc to the domain. For example, in the review ”The

Samsung remote is awesome and easy to use.”, if we

know that ”awesome” has a positive sentiment then,

due to the conjunction AND, we can infer that ”easy”

or ”easy to use” has also a positive sentiment associ-

ated. In the same way, on the video game review ”The

game has beautiful graphics but easy to complete.”, if

we know that ”beautiful” has a positive polarity we

can infer that the conjunction BUT will reverse the

polarity on ”easy”. The authors named this concept

as ”sentiment consistency”.

Another proposal for corpus based lexicon expan-

sion is presented in (Qiu et al., 2011). It uses a set

of seed words combined with conjunction rules for

extracting entities and opinion words. Then, through

an iterative process, the new pairs of entities/opinion

words are used for ﬁnding more pairs and ends when

no new entities or opinion words are found. Evalua-

tion on reviews dataset showed that this method out-

performs other state of the art approaches (such as the

one in (Hu and Liu, 2004a)).

However, not always opinion words have the same

polarity, even in the same domain. For instance, in a

laptop review, ”the battery is long” is identiﬁed as

positive whereas ”it takes to long to start” is associ-

ated with a negative sentiment. So, to avoid erroneous

sentiment classiﬁcation, the use of entity level sen-

timent analysis techniques and the extraction of the

ternary (word,entity, sentiment) was proposed for lex-

icon expansion (Ding et al., 2008).

Besides reviews, social networks have been ex-

plored for corpus based lexicon expansion. As a mat-

ter of fact, many social networks have speciﬁc opin-

ion words that are normally not covered by the gen-

eral sentiment lexicons (e.g. ”ahahahah”, ”LOL”,

”OMG”, ”#hatemonday”). The study in (Bravo-

Marquez et al., 2015a) present two models for cre-

ating a Twitter speciﬁc lexicon from a unlabelled

corpus of tweets using tweet-centroid word vectors.

The lexicon is classiﬁed into Positive, Neutral and

Negative scores. Another work by the same au-

thors (Bravo-Marquez et al., 2015b) presents a super-

vised algorithm for lexicon expansion using tweets

label with emoticons and a combination of several

seed word lexicons. Other supervised approach (Tang

et al., 2014) uses SkipGram (for learning continuous

phrases representation) and a seed lexicon (expanded

with contents from the Urban Dictionary (Dictionary,

2016)) as training data for a sentiment lexicon ex-

pansion classiﬁer. One more study (Du et al., 2010)

shapes the information bottleneck method with cross-

domain and inter-domain knowledge to extract a do-

main oriented lexicon.

A rather different approach is the one presented

in (Feng et al., 2011). Whereas most of the methods

presented focus on expanding sentiment lexicons with

adjectives and verbs, Feng et al. study the inﬂuence of

words with connotative polarity such as cancer, pro-

motion and tragedy. Furthermore, they also use an

unusual graph approach which incorporates with the

PageRank algorithm and a seed of opinion words to

propose a connotative lexicon creation system.

In fact, the majority of works study how to expand

sentiment lexicons with verbs and adjectives. In some

contexts, nouns may also imply opinion. For exam-

ple in the mattress review ”Within a month, a valley

formed in the middle of the mattress” or in the tablet

review ”It came with a scratch in the screen”. The

authors in (Zhang and Liu, 2011) study nouns that

may imply sentiment in product features. The study

relies on an seed lexicon to identify the sentiment on

reviews and then select candidates for feature nouns

that suggest opinion.

The detection of sentiment in words other than

adjectives and verbs is yet an understudied research

area. Therefore, in this work it is the exploration of

assigning sentiment to connotative words, nouns that

imply opinion, entities and topics that it will be high-

lighted. We intend to expand even more the senti-

ment lexicons in this studies by using public opinion

as a measure of polarity, combining Twitter sentiment

analysis and lexicon expansion methods to create new

domain and time speciﬁc sentiment dictionaries.

3 WORKFLOW DESCRIPTION

In the following section we describe the workﬂow

of our lexicon expansion proposal. We select terms

from seven different domains: world, health, enter-

tainment, politics, business, sports, and technology.

For each domain we have a set of RSS URLs from

several news websites in the English language (e.g.

CNN, BBC, The New York Times). In each RSS

feed, only the headline for each news is extracted

since: 1) it summarizes the full article and 2) its short

length provides an easier ﬁltering of irrelevant words

or terms. This way, we create a text corpus composed

only of news headlines, from several sources, for each

domain.

3.1 Term Extraction

For each domain corpus, we construct a term-

document frequent matrix and retrieve the most fre-

quent occurrences of 1-grams (words), 2-grams (two

word terms) and 3-grams (three word terms).

Lexicon Expansion System for Domain and Time Oriented Sentiment Analysis

465

The terms we deﬁne as ”frequent” rely on the

number of sources we have for the domain and the

grams we are considering. The formula used to deter-

mined the frequency threshold (and therefore to de-

cide if a term should be included in the lexicon) is

presented in (1).

frequency threshold

d,i

= n

× a

(1)

where n

is the number of sources for domain d and

represents the percentage of the cut in each i-gram.

In other words, if a term occurs more than the fre-

quency threshold variable it is included in the lexicon.

The values for a

were reached experimentally and are

presented in (2).

= 0.50; a

= 0.30; a

= 0.25 (2)

It is important to ﬁlter some of the ”noisy” terms

(i.e. terms that are irrelevant for sentiment analysis)

from the list extracted. The 1-gram are the ones that

commonly have the most noisy data. Several ﬁlters

are applied in order to reduce it. First, the words

are classiﬁed with the OpenNLP Parts-of-Speech tag-

ger (Apache, 2010). Then, only words classiﬁed as

nouns, foreign words and adjectives are kept. Verbs

are excluded due to the lack of context. For exam-

ple, ”wins” or ”lost” are generally associated with a

positive and negative sentiment, respectively. How-

ever, if we know to whom, or what, it refers to (e.g

”Trump wins” or ”Hillary lost”) then the public sen-

timent of the term may not be the same. Then, a

list of domains of speciﬁc words is used to remove

1-grams that do not infer any particular sentiment.

We use Topic Dictionaries from (Oxford, 2016) to

achieve that purpose. We left, however, words that

refer to corporations and entities (e.g. ”Apple” and

”Microsoft” in technology domain). In addition, we

also remove words that are common in the news (e.g.

”review”, ”tech”, ”news”). Furthermore, words that

are repeated in plural form (”syrian”/”syrians”) and

with apostrophe (”Trump”/”Trump’s”) are only kept

in singular and non-apostrophe form. We also re-

move words that are in the AFINN (Nielsen, 2011)

sentiment lexicon because those words by themselves

already express sentiment.

Since the number of 2-grams and 3-grams terms

obtained are less than the number of 1-grams and, be-

cause they appear frequently together (meaning that

they already have relevance in the domain), only the

plural and apostrophes ﬁlter is applied. We then send

the terms to Twitter where a last ﬁlter on our ﬁnal

terms list is used. This ﬁlter relies on the number of

tweets found on the terms. If it is lower than a deﬁned

threshold, it will not be included in the terms list.

The number of extracted terms is dynamic and

highly depends on the relevance that they have in

news media. In our work, we consider the extracted

terms relevant since: 1) they appear multiple times in

the same domain in several different news sources,

and because 2) when querying Twitter there is a

signiﬁcantly high number of tweets regarding those

terms on the same day they are extracted from the

news sources. Our method requires that there is a

minimum number of tweets that include the term. If

that number is not fulﬁlled, the term is removed since

it is likely to be irrelevant or syntactically incomplete

(e.g. from the headline ”Zika virus found in Mon-

tana”, the term ”found in” is not relevant).

3.2 Term Sentiment Analysis

To evaluate the sentiment of each term extracted from

the headlines of RSS news feeds, we use the Twit-

ter Search API (Twitter, 2016). Unlike similar stud-

ies who evaluate the sentiment of terms in the com-

ments from users on news site (Moreo et al., 2012)

or who select speciﬁc keywords from Twitter streams

(Wang et al., 2012; Nguyen et al., 2012), our system

uses a combination of both approaches. Using tweets,

we guarantee that the opinions retrieved are not com-

pletely anonymous (like in the majority of news web-

site) and therefore hate, advertise and insulting com-

ments are less common.

In addition, several works have proved good re-

sults using Twitter for topic tracking (Wang et al.,

2012; Amer-Yahia et al., 2012; Phuvipadawat and

Murata, 2010) and Twitter users tend to react quickly

to the occurrence of events which lead to several tech-

niques for detecting real-time events on the social net-

work (Atefeh and Khreich, 2015).

Moreover, using news headlines to search for

terms to track in each domain, we guarantee that they

are up to date and are relevant.

When the extraction procedure for all the domain

ﬁnishes, the resulting terms are searched in Twitter

and a sample is retrieved for each one of them. In our

experiments we use a sample of 500 tweets for each

term. In order to keep the term sentiment updated, the

tweets extracted must be posted in the same day as the

keyword extraction. In addition, we only get the more

recent tweets. Furthermore, in order to remove tweets

that can be from Twitter accounts who belong to news

sites or newspapers, we apply a ﬁlter in our query that

does not retrieve tweets containing links. This is due

to the nature of tweets posted by news site accounts,

which contain the link for accessing the full news in

the correspondent website.

Using the Twitter API, the number of tweets ex-

tracted for each term is not always the same as re-

quest. This can also be used as a last ﬁlter for terms

KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval

466

extraction. In fact, if a keyword does not retrieve a

minimum amount of tweets, this can be interpreted

as non relevance or lack of meaning of the keyword.

So in our system, if the keyword searched in Twitter

does not retrieve a minimum number of tweets, it is

discarded. In our experimental setup we used 33%

of the sample as the threshold for not discarding the

term.

The next step is to perform sentiment analysis on

each tweet from the sample. We separate our method

in two components: the syntactic analysis of the tweet

and the identiﬁcation and assessment of possible emo-

jis and emoticons present in the tweet. Several studies

(Go et al., 2009; Novak et al., 2015) provide evidence

that emojis and emoticons are used as sentiment clues.

In fact, emoticons were already used as classiﬁers of

the polarity of the tweet since they are not speciﬁc to

a certain domain (Hogenboom et al., 2013).

In order to evaluate emojis we use the results from

(Novak et al., 2015) where the authors assess the sen-

timent of each emoji. As for the emoticons we con-

sider the sentiment classiﬁcation used in (Hogenboom

et al., 2015). In our system, the emojis and emoti-

cons in the tweet have the same sentiment impact in-

dependently of the position where they occur. All the

emojis/emoticons are considered and repetitions are

not discarded. The sentiment is calculated by simple

average (summing all the identiﬁed emojis and emoti-

cons and dividing by the number of occurrences).

The syntactic component of our analysis is to eval-

uate the sentiment on the text of each tweet. With

the goal of not inducing wrong sentiment, we remove

the term queried from each tweet. Hence, terms like

”Trump wins” which already have a positive senti-

ment associated (due to the word ”wins”) do not skew

our analysis. To determine the sentiment on the re-

maining words from the tweet, the general sentiment

lexicon AFINN (Nielsen, 2011) is used. We choose

this lexicon because, unlike more classical proposals

(Hu and Liu, 2004b) in which sentiment words are

classiﬁed only in a polarity fashion, AFINN provides

2477 words classiﬁed with sentiment in a [−5, 5] in-

terval. In addition to the sentiment lexicon, we also

use lists of ampliﬁers (e.g. ”very”, ”extremely”,

”more”) and attenuation (e.g. ”few”,”little”,”rarely”)

words for better sentiment analysis. These assign a

weight of 80% on the word polarity. Furthermore, we

use a list a words that reverse the polarity (e.g. ”not”,

”nobody”, ”never”).

Due to the tweets limitation of 140 characters, the

sentiment value of a word is affected if there is any

element of the lists mentioned in the 4 words before

and/or in the following 2. In other words, for each

opinion word found in the tweet we create a cluster

with the previous 4 words and the next 2. Then, we

verify if any of them match the words in the ampli-

ﬁcation, attenuation or negation lists and assign the

sentiment accordingly. The syntactic sentiment score

of each tweet is calculated combining the lexicon and

method previously mentioned. The ﬁnal score for

each tweet is a weighted average of 75% the text sen-

timent analysis and 25% the emoticon/emoji senti-

ment analysis. The assumption is that the average

sentiment of the sample represents the overall senti-

ment of the searched term. Therefore, the term senti-

ment score is calculated by summing up the sentiment

score of each tweet from the term corpus and dividing

it by the number of tweets in that corpus.

4 EVALUATION

4.1 Tweets Sentiment Analysis

Since one of our goals is to assess if the polarity

of the term can be obtained from the average senti-

ment of a sample of tweets containing the term, it is

important to have a accurate tweet sentiment classi-

ﬁer. Therefore, we evaluate and compare our polarity

classiﬁcation method in several datasets provided by

CrowdFlower (in their ”Data for Everyone” library).

Five datasets of tweets, classiﬁed with sentiment by

human coders were used. A brief explanation on

each dataset follows (for more details please refer to

(Crowdﬂower, 2016)):

• GOP: contains over ten thousand tweets about the

GOP debate in Ohio. Workers classiﬁed the sen-

timent of each tweet as Positive, Neutral or Nega-

tive.

• SDC: includes approximately 7000 tweets about

self driving cars. Workers were asked to clas-

sify the sentiment as Very Positive, Slightly Pos-

itive, Neutral, Slightly Negative, Very Negative.

We converted this to a Positive/Neutral/Negative

scale.

• USAIR: dataset with around 16000 tweets about

major US airlines. Contributors were asked to as-

sign a Neutral, Positive or Negative sentiment to

each tweet.

• COACH: dataset with 3847 tweets with reactions

to the 2015 Coachella festival lineup announce-

ment. Workers classiﬁed the sentiment of each

tweet as Neutral, Positive or Negative

• APPLE: 4000 tweets containing references to

the Apple company. Sentiment classiﬁcation was

done with a Negative, Neutral and Positive scale.

Lexicon Expansion System for Domain and Time Oriented Sentiment Analysis

467

We are interested in determining the polarity in

2 classes (positive/negative) of each of the extracted

terms. Therefore, we discarded the neutral tweets

from the datasets. The results of that score in terms

of, precision, recall, f1-measure and accuracy can be

examined in Table 1.

Table 1: Results in terms of precision (Prec.), recall (Rec.),

F1-Measure (F1), and accuracy (Acc).

Dataset

Prec. (%) Rec. (%) F1 (%) Acc. (%)

Pos. Neg. Pos. Neg. Pos. Neg. Pos.+Neg.

GOP 32.8 90.9 78.3 57.5 46.2 70.4 61.8

SDC 82.5 53.9 76.4 63.1 79.3 58.1 72.3

USAIR 39.4 97.4 94.9 56.9 55.7 71.9 65.6

COACH 85.3 42.5 74.2 59.7 79.3 49.6 70.7

APPLE 55.4 93.9 86.8 74.4 67.7 83.0 77.7

When analyzing each of the datasets, the senti-

ment component of our system seems to achieve bet-

ter performance in tweets regarding the technology

domain (SDC and APPLE). However, variation on

accuracy values does not surpass 20% which gives a

solid support that our method will perform well in-

dependently of the tweets domain. Accuracy reaches

the lowest value in the GOP dataset. Similar conclu-

sion was reached in (Thelwall et al., 2012) where the

authors assess the low performance on some web ex-

tracted datasets due to political and controversial top-

ics. In addition, we compare our sentiment compo-

nent (SC) to other state-of-art methods to check if it

was able to match them as 2-class (positive/negative)

accuracy is concerned. A brief description on each

system follows:

• Emolex: Manually created emotion lexicon using

crowd-sourcing. The terms were extracted from a

combination of Macquarie Thesaurus, General In-

quirer and WordNet Affect Lexicon (Mohammad

and Turney, 2010). Although the words were clas-

siﬁed with emotion and polarity, only the second

was used for this method.

• SenticNet: Assigns sentiment to common sense

concepts to achieve a semantic sentiment analysis

approach rather than the most common sentence

level (Cambria et al., 2014).

• SentiStrength: Combines a manually annotated

sentiment lexicon, machine learning algorithms

and other important features like negation words

and repeated punctuation for sentiment enhance.

It provides the best results in gold standard tweet

datasets (Thelwall et al., 2012).

The results can be seen in Table 2.

Although it is not the best system when com-

pared to other state of the art approaches, our sen-

timent component still performs well on the differ-

ent datasets achieving the best accuracy in 2 of them.

Table 2: Comparison of the sentiment component (SC) of

our system with other state of the art approaches.

Sentiment

System

2-Class Dataset Accuracy

GOP SDC USAIR COACH APPLE Average

SC 61.8 72.3 65.6 70.1 77.7 69.6

Emolex 46.0 64.9 46.9 65.6 70.3 58.7

SenticNet 37.3 68.5 39.1 74.7 46.9 53.3

SentiStrength 70.4 70.1 76.5 73.3 74.5 73.1

In addition only is beaten by 4% margin by Sen-

tiStrength when assessing the overall accuracy.

In conclusion, these results provide a good sup-

port for the reliability on tweet classiﬁcation of our

system.

4.2 System Evaluation

In order to evaluate our system we carried out two ex-

perimental surveys. The ﬁrst had the goal of assessing

the effectiveness of our proposal in extracting relevant

terms for each of the domains. The second survey was

to evaluate if the sentiment assigned to each term was

still accurate at present time.

The survey was conducted in a web application

built for the effect. The question asked was ”Con-

sidering the present time (and current news), does

the term x ﬁts the domain y ?” where x and y were

replaced randomly by the entries extracted from our

system. Since our goal was solely to test our extrac-

tion method we allow users to classify an unrestricted

number of entries. We obtained a total of 1414 en-

tries classiﬁed by 57 different users consisting mostly

of university students. When evaluating the ﬁtness of

the term in the domain we discarded all the entries of

users whose response was ”I don’t know”. In the re-

maining 1336 entries we had an accuracy of 88.2%.

This provides strong empirical evidence for our term

selection method.

The second part of our study was to determine if

Twitter sentiment on an extracted term reﬂects the

current sentiment of the term. To assess this we

used Crowdﬂower to conduct a sentiment survey. We

used terms extracted from 2016-04-01 till 2016-04-

03. The experience began on 2016-04-04 at approx-

imately 3:15 pm and took 30 hours to complete. We

submitted 101 pairs of terms/domain extracted ran-

domly (but in equal number for each domain) from

the daily retrieved dictionaries. Each of those terms

was evaluated by 7 different workers with a level 3

performance. This level is assigned to workers who

achieved high accuracy values in more than a hun-

dred test questions (Crowdﬂower, 2014). The ques-

tion asked in this CrowdFlower survey was: ”Consid-

ering the present time (and current news) and the do-

main x, please rate the sentiment associated with the

KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval

468

expression y” where x is the domain and y the term.

The scale provided ranged from 1 (very negative) to

5(very positive). Although we are trying to assess the

general polarity of the term, we used a likert scale

to force workers to have a more careful decision on

which sentiment to choose, avoiding a randomly (and

easiest) choice.

We used the median as measure to determine our

ground truth for each term since the average value

could be highly inﬂuenced by possible outliers. For

example, if six workers evaluate the term with a 2 and

a worker with a 5 the average value would result in a

ﬁnal sentiment of 3 (neutral value). Using the median

the ﬁnal sentiment would result in a more realistic 2

(negative polarity).

We converted the results to ﬁt our polarity scale.

Values below 3 were classiﬁed as negatives and above

as positives. Once again, we discarded the neutral val-

ues since our system assigns a positive/negative out-

put for each term. We notice however, that the number

of terms classiﬁed as neutral was signiﬁcantly high

(around 40% of our sample). This experimental re-

sults suggest that an implementation of a neutral clas-

siﬁcation must be accounted in future work.

As it was already mention, there are two types of

automatic lexicon expansion methods: thesaurus and

corpus based. However (and although we consider

our approach to ﬁt the corpus based category), our

system cannot be compared to any of those methods.

This is because traditional corpus based methods fo-

cus solely in one corpus and retrieve the sentiment

words of it. However, our proposed method, gener-

ates a corpus for each extracted term. Furthermore,

most of the state of the art approaches focus in re-

trieving opinion words classiﬁed majorly as adjectives

and verbs. Consequently, any term comparisons with

other methods is hard to achieve. Therefore, we com-

pare our results against a random baseline (achieved

with the best overall accuracy of 5 attempts) and a

majority baseline (which classiﬁes all terms as the the

class who is more frequent). The results are presented

in Table 3.

Table 3: Comparison of results of our system (SR) against

a random baseline(Rbl) and a majority baseline (Mbl).

Prec.(%) Rec.(%) F1(%) Acc.(%)

Pos. Neg. Pos. Neg. Pos. Neg. Pos+Neg

SR 74.36 90.00 93.55 64.29 82.86 75.00 79.67

Rbl 65.71 66.67 74.19 57.14 69.70 61.54 66.10

Mbl 52.54 NA 100 0 68.89 NA 52.54

Experimental results show good overall accuracy

of 79.7%. A closer analysis on the predictions of the

system has revealed a particularly low performance

on political terms. This is presumably because sev-

eral of the used terms have a rather controversial sen-

timent. As an example we have ”abortion”, ”national

living wage”, and political candidates in US elec-

tions such as ”Donald Trump”, ”Hillary Clinton” or

”Bernie Sanders”. In the entertainment domain the

results are much better, missing solely in ”batman v

superman”.

We are aware that our experiments involved a

small number of terms. However, since we are evalu-

ating time and domain speciﬁc terms, including more

terms in our analysis from extractions further back in

the past would not correspond to what we are trying

to assess. We also considered extending to more do-

mains but deﬁning the ”ground truth” sentiment in

domains which have a narrowed scope could result

in more neutral classiﬁcations due to unfamiliarity of

the term to the workers.

5 CONCLUSIONS AND FUTURE

WORK

In this work we have described a system for automat-

ically extracting the more relevant terms from seven

different domains and to classify their sentiment in

a positive/negative scale. Our proposal retrieves the

more frequent terms from news headlines using RSS

feeds from several news sources. We then query Twit-

ter with the same terms and infer their polarity using

the average sentiment classiﬁcation obtained from the

sample of tweets.

Our experiments shown that the proposed term

extraction component is rather effective, achieving a

80% accuracy. Some of the limitations of our method

are due to the accuracy of the used NLP classiﬁer that

lead to some noisy unigram terms. Future research

will try to explore more ﬁlters for a ﬁne grain selec-

tion of unigrams in the different domains. Possible ﬁl-

ters may include the use of different NLP classiﬁers to

determine the part of speech tags and use name entity

recognition techniques to infer terms that are refer-

ring to the same entity (e.g. ”Obama”and ”POTUS”).

We also plan to uncover the relations between the 1-

gram, 2-gram and 3-gram lists. For example, although

the terms ”april fools’”, ”fools’ day” and ”april fools

day” are expected to have similar polarity, the terms

”Syria” and ”Syria ceaseﬁre” are not.

Our sentiment classiﬁer also produced good re-

sults in detecting the polarity of tweets from several

different domains. Tests on labelled Twitter datasets

achieved an overall accuracy ranging from 61.8%

(GOP dataset) to 77.7% (APPLE dataset). Further-

more, when compared to other state of the art systems

for sentiment analysis, it was only surpassed by Sen-

tiStrength by a minimal 4% margin.

Lexicon Expansion System for Domain and Time Oriented Sentiment Analysis

469

The two preliminary evaluation experiments we

have described have provided strong evidence on

the validity of our approach. Experimental re-

sults using Crowdﬂower lead to an overall accuracy

of 79.67% with positive terms achieving better f1-

measure (82.86%) than the negative ones (75.00%).

In future work, we will use the results from our sys-

tem to complement and expand sentiment lexicons for

domain and time speciﬁc contexts. We intend to as-

sess if these lexicons can improve sentiment classi-

ﬁcation on dictionary-based approaches speciﬁcally

on short informal texts (like tweets or website com-

ments).

ACKNOWLEDGEMENTS

This work is supported by the ERDF European

Regional Development Fund through the COM-

PETE Programme (operational programme for com-

petitiveness) and by National Funds through the

FCT (Portuguese Foundation for Science and Tech-

nology) within project Reminds/UTAP-ICDT/EEI-

CTP/0022/2014.

REFERENCES

Amazon (2016). Amazon mechanical turk. https://

www.mturk.com/mturk/welcome. Acessed: 2016-08-

21.

Amer-Yahia, S., Anjum, S., Ghenai, A., Siddique, A., Ab-

bar, S., Madden, S., Marcus, A., and El-Haddad, M.

(2012). MAQSA: a system for social analytics on

news. In Proceedings of the ACM SIGMOD Interna-

tional Conference on Management of Data, SIGMOD

2012, Scottsdale, AZ, USA, May 20-24, 2012, pages

653–656.

Apache (2010). Apache OpenNLP . https://opennlp.

apache.org/. Acessed: 2016-08-21.

Atefeh, F. and Khreich, W. (2015). A survey of techniques

for event detection in twitter. Computational Intelli-

gence, 31(1):132–164.

Baccianella, S., Esuli, A., and Sebastiani, F. (2010). Senti-

wordnet 3.0: An enhanced lexical resource for sen-

timent analysis and opinion mining. In Chair), N.

C. C., Choukri, K., Maegaard, B., Mariani, J., Odijk,

J., Piperidis, S., Rosner, M., and Tapias, D., editors,

Proceedings of the Seventh International Conference

on Language Resources and Evaluation (LREC’10),

Valletta, Malta. European Language Resources Asso-

ciation (ELRA).

Bravo-Marquez, F., Frank, E., and Pfahringer, B. (2015a).

From unlabelled tweets to twitter-speciﬁc opinion

words. In Proceedings of the 38th International ACM

SIGIR Conference on Research and Development in

Information Retrieval, SIGIR ’15, pages 743–746,

New York, NY, USA. ACM.

Bravo-Marquez, F., Frank, E., and Pfahringer, B. (2015b).

Positive, negative, or neutral: Learning an expanded

opinion lexicon from emoticon-annotated tweets. In

Proceedings of the Twenty-Fourth International Joint

Conference on Artiﬁcial Intelligence, IJCAI ’15.

Cambria, E., Olsher, D., and Rajagopal, D. (2014). Sentic-

net 3: A common and common-sense knowledge base

for cognition-driven sentiment analysis. In Proceed-

ings of the Twenty-Eighth AAAI Conference on Artiﬁ-

cial Intelligence, AAAI’14, pages 1515–1521. AAAI

Press.

Crowdﬂower (2014). Introducing contributor performance

levels. http:// crowdﬂowercommunity.tumblr.com/

post/80598014542/introducing-contributor-

performance-levels. Acessed: 2016-04-10.

CrowdFlower (2016). Crowdﬂower: Make your data use-

ful. https://www.crowdﬂower.com/. Acessed: 2016-

08-21.

Crowdﬂower (2016). Data for everyone. http://

www.crowdﬂower.com/data-for-everyone. Acessed:

2016-04-10.

Dictionary, U. (2016). Urban dictionary. www. urbandic-

tionary.com. Acessed: 2016-08-21.

Ding, X., Liu, B., and Yu, P. S. (2008). A holistic lexicon-

based approach to opinion mining.

Du, W., Tan, S., Cheng, X., and Yun, X. (2010). Adapt-

ing information bottleneck method for automatic con-

struction of domain-oriented sentiment lexicon. In

Proceedings of the Third ACM International Confer-

ence on Web Search and Data Mining, WSDM ’10,

pages 111–120, New York, NY, USA. ACM.

Esuli, A. and Sebastiani, F. (2006). Sentiwordnet: A pub-

licly available lexical resource for opinion mining. In

In Proceedings of the 5th Conference on Language

Resources and Evaluation (LREC06, pages 417–422.

Fellbaum, C., editor (1998). WordNet: an electronic lexical

database. MIT Press.

Feng, S., Bose, R., and Choi, Y. (2011). Learning general

connotation of words using graph-based algorithms.

In Proceedings of the Conference on Empirical Meth-

ods in Natural Language Processing, EMNLP ’11,

pages 1092–1103, Stroudsburg, PA, USA. Associa-

tion for Computational Linguistics.

Go, A., Bhayani, R., and Huang, L. (2009). Twitter senti-

ment classiﬁcation using distant supervision. CS224N

Project Report, Stanford, 1:12.

Hatzivassiloglou, V. and McKeown, K. R. (1997). Predict-

ing the semantic orientation of adjectives. In Pro-

ceedings of the 35th Annual Meeting of the Associa-

tion for Computational Linguistics and Eighth Con-

ference of the European Chapter of the Association

for Computational Linguistics, ACL ’98, pages 174–

181, Stroudsburg, PA, USA. Association for Compu-

tational Linguistics.

Hogenboom, A., Bal, D., Frasincar, F., Bal, M., de Jong,

F., and Kaymak, U. (2013). Exploiting emoticons in

sentiment analysis. In Proceedings of the 28th An-

KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval

470

nual ACM Symposium on Applied Computing, SAC

’13, pages 703–710, New York, NY, USA. ACM.

Hogenboom, A., Bal, D., Frasincar, F., Bal, M., De Jong, F.,

and Kaymak, U. (2015). Exploiting emoticons in po-

larity classiﬁcation of text. J. Web Eng., 14(1-2):22–

40.

Hu, M. and Liu, B. (2004a). Mining and summariz-

ing customer reviews. In Proceedings of the Tenth

ACM SIGKDD International Conference on Knowl-

edge Discovery and Data Mining, KDD ’04, pages

168–177, New York, NY, USA. ACM.

Hu, M. and Liu, B. (2004b). Mining opinion features in

customer reviews. In Proceedings of the 19th National

Conference on Artiﬁcal Intelligence, AAAI’04, pages

755–760. AAAI Press.

Hutto, C. J. and Gilbert, E. (2014). Vader: A parsimonious

rule-based model for sentiment analysis of social me-

dia text. In Adar, E., Resnick, P., Choudhury, M. D.,

Hogan, B., and Oh, A. H., editors, ICWSM. The AAAI

Press.

Kim, S.-M. and Hovy, E. (2004). Determining the sentiment

of opinions. In Proceedings of the 20th International

Conference on Computational Linguistics, COLING

’04, Stroudsburg, PA, USA. Association for Compu-

tational Linguistics.

Kiritchenko, S., Zhu, X., and Mohammad, S. M. (2014).

Sentiment analysis of short informal texts. J. Artif.

Int. Res., 50(1):723–762.

Mohammad, S., Dunne, C., and Dorr, B. (2009). Generat-

ing high-coverage semantic orientation lexicons from

overtly marked words and a thesaurus. In Proceed-

ings of the 2009 Conference on Empirical Methods in

Natural Language Processing: Volume 2 - Volume 2,

EMNLP ’09, pages 599–608, Stroudsburg, PA, USA.

Association for Computational Linguistics.

Mohammad, S. M. and Turney, P. D. (2010). Emotions

evoked by common words and phrases: Using me-

chanical turk to create an emotion lexicon. In Pro-

ceedings of the NAACL HLT 2010 Workshop on Com-

putational Approaches to Analysis and Generation of

Emotion in Text, CAAGET ’10, pages 26–34, Strouds-

burg, PA, USA. Association for Computational Lin-

guistics.

Moreo, A., Romero, M., Castro, J., and Zurita, J. (2012).

Lexicon-based comments-oriented news sentiment

analyzer system. Expert Syst. Appl., 39(10):9166–

9180.

Nguyen, L. T., Wu, P., Chan, W., Peng, W., and Zhang,

Y. (2012). Predicting collective sentiment dynamics

from time-series social media. In Proceedings of the

First International Workshop on Issues of Sentiment

Discovery and Opinion Mining, WISDOM ’12, pages

6:1–6:8, New York, NY, USA. ACM.

Nielsen, F. A. (2011). Aﬁnn.

Novak, P. K., Smailovic, J., Sluban, B., and Mozetic, I.

(2015). Sentiment of emojis. CoRR, abs/1509.07761.

Oxford (2016). Oxford Learner’s Dictionaries topic dic-

tionaries. http://www.oxfordlearnersdictionaries.com/

topic/. Acessed: 2016-07-03.

Phuvipadawat, S. and Murata, T. (2010). Breaking news

detection and tracking in twitter. In Web Intel-

ligence and Intelligent Agent Technology (WI-IAT),

2010 IEEE/WIC/ACM International Conference on,

volume 3, pages 120–123.

Qiu, G., Liu, B., Bu, J., and Chen, C. (2011). Opinion word

expansion and target extraction through double prop-

agation. Comput. Linguist., 37(1):9–27.

Stone, P. J., Dunphy, D. C., Smith, M. S., and Ogilvie, D. M.

(1966). The General Inquirer: A Computer Approach

to Content Analysis. MIT Press, Cambridge, MA.

Taboada, M., Brooke, J., Toﬁloski, M., Voll, K., and Stede,

M. (2011). Lexicon-based methods for sentiment

analysis. Comput. Linguist., 37(2):267–307.

Tang, D., Wei, F., Qin, B., Zhou, M., and Liu, T. (2014).

Building large-scale twitter-speciﬁc sentiment lexi-

con : A representation learning approach. In COL-

ING 2014, 25th International Conference on Compu-

tational Linguistics, Proceedings of the Conference:

Technical Papers, August 23-29, 2014, Dublin, Ire-

land, pages 172–182.

Thelwall, M., Buckley, K., and Paltoglou, G. (2012). Senti-

ment strength detection for the social web. J. Am. Soc.

Inf. Sci. Technol., 63(1):163–173.

Twitter (2015a). Twitter Company about. https://about.

twitter.com/company. Acessed: 2015-10-19.

Twitter (2015b). Twitter Company rest. https://dev.

twitter.com/rest/public. Acessed: 2015-10-19.

Twitter (2016). Twitter Developers. https://dev.twitter.com/.

Acessed: 2016-03-08.

Wang, H., Can, D., Kazemzadeh, A., Bar, F., and

Narayanan, S. (2012). A system for real-time twit-

ter sentiment analysis of 2012 u.s. presidential elec-

tion cycle. In Proceedings of the ACL 2012 System

Demonstrations, ACL ’12, pages 115–120, Strouds-

burg, PA, USA. Association for Computational Lin-

guistics.

Zhang, L. and Liu, B. (2011). Identifying noun product fea-

tures that imply opinions. In Proceedings of the 49th

Annual Meeting of the Association for Computational

Linguistics: Human Language Technologies: Short

Papers - Volume 2, HLT ’11, pages 575–580, Strouds-

burg, PA, USA. Association for Computational Lin-

guistics.

Lexicon Expansion System for Domain and Time Oriented Sentiment Analysis

471