The Twitter-Lex Sentiment Analysis System

Sergiu Limboi and Laura Dios¸an

Faculty of Mathematics and Computer Science, Babes¸-Bolyai University, Cluj-Napoca, Romania

Keywords:

Twitter Sentiment Analysis, Hashtag-based Features, Lexicon Features, Data Representation.

Abstract:

Twitter Sentiment Analysis is demanding due to the freestyle way people express their opinions and feelings.

Using only the preprocessed text from a dataset does not bring enough value to the process. Therefore,

there is a need to deﬁne and mine different and complex features to detect hidden information from a tweet.

The proposed Twitter-Lex Sentiment Analysis system combines lexicon features with Twitter-speciﬁc ones to

improve the classiﬁcation performance. Therefore, several features are considered for the Sentiment Analysis

process: only textual input from a tweet, hash-tags, and some ﬂavors that combine them with the feature

deﬁned based on the result produced by a lexicon. So, the Vader lexicon is used to determine the sentiment of

a tweet. This output will be appended to the four perspectives we deﬁned, considering the features offered by

Twitter. The experimental results reveal that our system, which focuses on the role of features in a classiﬁcation

process, outperforms the baseline approach (use of original tweets) and provides good value to new directions

and improvements.

1 INTRODUCTION

Nowadays, social media is gaining ground due to the

high availability, variety of ways to express your feel-

ings, opinions, and fast access to much information.

The Twitter platform is one of the most used environ-

ments for posting messages (Antonakaki et al., 2021).

The critical concept around Twitter is represented by

tweets which are messages consisting of a maximum

of 280 characters. In addition, a message can be char-

acterized by hashtags which are words preﬁxed with

the # symbol and indicate the concepts or relevant

keys of the message. Furthermore, a Twitter user can

build connections with other users who share the same

interest, developing a friend-follower relationship.

Considering these aspects, there is an increased

need to analyze users’ opinions to deﬁne future pre-

dictions, observe trends regarding an event or famous

star, etc. Therefore, a Sentiment Analysis can be ap-

proached by following and depicting text polarities

(positive, negative, and neutral) from tweets to build

an overview of a relevant topic.

Various approaches use Sentiment Analysis in

literature to classify textual information, especially

tweets. An interesting phase of the sentiment detec-

tion task is determining the features involved in the

process. According to Koto (Koto and Adriani, 2015)

there are several categories regarding the features that

can be deﬁned for the classiﬁcation of messages:

• punctuation: number of ”?”, ”!” or another special

character;

• lexical features: the size of a tweet, number of

lowercase words, number of hashtags;

• Part-of-speech (POS): number of nouns, verbs,

adverbs, or adjectives;

• emoticon scores;

• sentiment scores from sentiment lexicons.

Another classiﬁcation of features is done by Car-

valho (Carvalho and Plastino, 2021) based on three

main categories:

• n-gram features;

• meta-level features: POS, linguistic features, or

emoticon;

• word embedding-based features: DeepMoji,

Emo2Vec, fastText, GloVe-TWT.

The proposed Twitter-Lex Sentiment Analysis

system uses a mix of Twitter-speciﬁc features (hash-

tags) and lexicon features for determining the senti-

ment (positive, negative, or neutral) of tweets. Con-

sequently, the original contributions of this paper are

the following:

• new features applied for Twitter Sentiment Analy-

sis by using the knowledge provided by a lexicon

180

Limboi, S. and Dio¸san, L.

The Twitter-Lex Sentiment Analysis System.

DOI: 10.5220/0011379900003335

In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 1: KDIR, pages 180-187

ISBN: 978-989-758-614-9; ISSN: 2184-3228

and mining different aspects of a tweet. The fea-

tures derived from tweets are text-based, hashtag-

based, text concatenated with hashtags, and text

with hashtags without the # indicator. All of them

are merged with the sentiment label provided by

Vader lexicon;

• a complex study of the enhancement consisted of

applying the whole process on several datasets

and considering multiple scenarios.

The remainder of the paper is organized as fol-

lows. Section two presents various approaches that

use several features or analyze the impact of features

for Twitter Sentiment Analysis. The overview of the

system is highlighted in section three, followed by the

fourth one that describes in detail the proposed ap-

proach. The numerical experiments and the results

are reﬂected in the ﬁfth section. Then, a comparison

between viewpoints is made in the sixth one. Conclu-

sions and future work are drawn in the last area.

2 RELATED WORK

Developing systems for Sentiment Analysis and eval-

uating the high impact of feature sets or different

types of inputs/ data models was approached in var-

ious ways in the literature. Therefore, this section

presents several perspectives that focus on the fea-

tures and models used for the Twitter polarity clas-

siﬁcation problem.

The approach from Chiong (Chiong et al., 2021)

aims to detect the depression hidden in tweets. The

posted messages are analyzed based on a combina-

tion of features. This mix consists of components re-

sulting from the sentiment lexicon and content-based

Twitter-speciﬁc features. The data sets from Shen’s

and Eye’s perspectives (Shen et al., 2017) are used

for the methodology. Tweets are marked as indi-

cating ”Depression” (negative sentiment) or ”Non-

depression” (positive view). Six feature groups are

deﬁned for the depression detection task. Three

groups contain features based on the sentiment lexi-

cons, and three use platform-speciﬁc features. So, the

ﬁrst three groups (A, B, and C) have attributes from

SentiWordNet and SenticNet libraries (e.g., number

of positive, negative, or neutral words). The remain-

ing groups have basic tweet information (e.g., the

number of words, the number of links), part-of-speech

(POS) features, and linguistic attributes (e.g., the ratio

of adverbs and adjectives, school-level indicator for

text understanding, etc.). After the feature extraction

process, data is split into training and test and passed

to four different classiﬁers: Support Vector Machine,

Logistic regression, Decision Tree, and Multilayer

Perceptron. The best classiﬁer is detected based on

evaluation measures (accuracy, precision, recall, and

f-score).

The aim of Rani’s perspective (Rani et al., 2021)

is to analyze the impact of features’ size on a senti-

ment classiﬁcation for the Twitter US Airline dataset.

Moreover, the feature selection technique is examined

to see what method best ﬁts a polarity detection prob-

lem. The designed system collects the messages and

applies cleaning and preprocessing techniques. Af-

ter this phase, Chi-Square and Information Gain are

used as feature selection techniques for deﬁning fea-

ture sets with different dimensions. In addition, a

sentiment score is added to each feature set by using

a sentiment lexicon. The enhanced model is passed

to various machine learning classiﬁers (Na

ıve Bayes,

SVM, or decision trees), and the results are evaluated

via accuracy and Kappa metric.

The approach from (Ayyub et al., 2020) applies

Sentiment Analysis to determine the ”relative fre-

quency” of a sentiment label, called ”sentiment quan-

tiﬁcation.” This methodology is divided into two main

phases: sentiment classiﬁcation task and computing

the frequency of the target class, also known as the

class of interest. The analysis aims to determine the

impact of linguistic features on the whole process

and compare different classiﬁcation techniques. The

designed system handles three types of feature ex-

traction methods. Firstly, the bag of words is con-

verted into TF-IDF values. The second approach

uses n-grams (here, words have assigned probabili-

ties). The last experiment involves the combination of

the two methods. Standford Twitter Sentiment, STS-

Gold, and Sanders are used as datasets. The experi-

ments handle different feature sets based on the previ-

ously mentioned techniques and use different classi-

ﬁers such as traditional machine learning approaches

or deep learning. Moreover, absolute error or relative

error are determined as evaluation measures.

Onan et al. (Onan, 2021) explores the sentiment

classiﬁcation issue for Turkish tweets. In addition,

it analyzes different word embedding-based features

using supervised learning algorithms (e.g., Na

ıve

Bayes, SVM) and ensemble learning techniques (e.g.,

AdaBoost, Random Subspace). The proposed system

deﬁnes nine weighting schemes. Two are unsuper-

vised (TF-IDF or term frequency). Seven are super-

vised: odds ratio, relevance frequency, balanced dis-

tributional concentration, inverse question frequency-

question frequency-inverse category frequency, short

text weighting, inverse gravity moment, and regular-

ized entropy (Onan, 2021). Tweets were collected for

two months via Twitter API to build the data set. A

manual annotation phase determines if a message is

The Twitter-Lex Sentiment Analysis System

181

positive or negative. After the preprocessing stage,

the list of words that compose the tweet is passed to

the weighing scheme, and a sentiment classiﬁer han-

dles the result. The system is evaluated using accu-

racy, precision, recall, and f-score.

The Twitter-Lex Sentiment Analysis system we

propose combines the information offered by a lexi-

con with Twitter-speciﬁc features for enhancing the

quality of the polarity classiﬁcation problem. The

overview and methodology will be presented in the

following sections.

3 PROPOSED APPROACH

3.1 System Overview

The Twitter-Lex Sentiment Analysis system, or

simply Twitter-Lex SA, has the goal to do a polar-

ity classiﬁcation process for tweets, in terms of three

classes: positive, negative, and neutral. The following

phases, illustrated in Figure 1, compose the entire ar-

chitecture:

• data collection

• data preprocessing

• tweet enhancement by using the Vader lexicon

• data representation

• classiﬁcation by using a Machine learning algo-

rithm

Figure 1: System Overview.

The data collection step means researching the

datasets containing labeled tweets. These labels rep-

resent the ground truth in the learning process and

are not used for the testing phase of tweet classiﬁ-

cation. The preprocessing stage is critical because the

system handles textual information. Due to abbrevia-

tions and free-word style, some cleaning mechanisms

are needed. The following techniques are used in this

phase: lowercasing, removing punctuation and stop

words, and stemming. Even though emojis are essen-

tial elements on social media and are potential indica-

tors for sentiments, the preprocessing phase does not

handle them since the used datasets have only a few

emojis.

The tweet enhancement phase means deﬁning four

new features by using the well-known sentiment lex-

icon, Vader (Hutto and Gilbert, 2014), and Twitter-

speciﬁc elements. This process is described in the fol-

lowing sub-sections. On the other hand, the tweet en-

hancement approach can be used in combination with

other lexicons by adapting the way of deﬁning the

sentiment scores, considering the lexicon’s features.

The enhanced tweet

lex

is converted into a numerical

representation needed for a classiﬁer. Therefore, a

TF.IDF representation (Baeza-Yates et al., 1999) is

used due to its easy way of computation.

As Machine learning classiﬁer, three algorithms

are used, very popular in the literature, Logistic re-

gression (LR) (Kleinbaum et al., 2002), Support Vec-

tor Machine (SVM) (Suthaharan, 2016) and Na

ıve

Bayes (NB) (Webb et al., 2010). The goal is to clas-

sify the enhanced tweets into positive, negative, or

neutral. As evaluation measures, accuracy and pre-

cision are computed. In addition, 95% conﬁdence in-

tervals are determined to deﬁne the interval of values

for the model’s performance.

The designed system, Twitter-Lex SA, aims to

explore the information offered by the Twitter plat-

form in combination with the use of a sentiment-

based lexicon. Analyzing only one set of features

(e.g., lexical ones) is insufﬁcient to provide a good

classiﬁcation. In most cases, textual input is not

enough when we talk about the Twitter platform

since there are relevant features that can highlight

the message (e.g., hashtags, mentions, etc.). More-

over, the context can be essential, and the combina-

tion of words within the sentence and other features

can change the overall polarity of input. Bearing in

mind all these things and starting from the previous

approach from (Limboi and Dios¸an, 2020), four fea-

tures are deﬁned as follows:

• Baseline Sentiment Analysis-Lexicon (BSA

lex

)

• Hashtag Sentiment Analysis-Lexicon (HSA

lex

)

• Fused Sentiment Analysis-Lexicon (FSA

lex

)

• Raw Sentiment Analysis-Lexicon (RSA

lex

)

KDIR 2022 - 14th International Conference on Knowledge Discovery and Information Retrieval

182

3.2 Vader Sentiment Lexicon

The Vader (Valence Aware Dictionary and Senti-

ment Reasoner) lexicon (Hutto and Gilbert, 2014) is

used for the Twitter-based perspectives deﬁned for the

process of tweets’ classiﬁcation. This lexicon uses a

combination of words marked based on their semantic

orientation. For each word, it gives a compound score

that means the sum of a word’s positive, negative, and

neutral values, followed by the normalization of the

sum in the range [−1, 1]. A sentiment score closer to

1 highlights a good positivity of the word compared

to a value closer to -1, which means vast negativity.

Given a tweet t, its Vader sentiment score will be rep-

resented by the sum of the compound scores of its

words.

Vader

(t) =

∑

i=1

Vader

(word

), (1)

where

• m is the number of words of a tweet

• s

Vader

(word

) is the sentiment score of the i

word, score provided by Vader sentiment lexicon.

The next step is determining the tweet’s senti-

ment according to the sentiment score. The threshold

0.05 was set considering the search from (Hutto and

Gilbert, 2014) that states best values are achieved by

using this value.

sent

Vader

(tweet) =











positive, if s

Vader

(t) > 0.05

negative, if s

Vader

(t) < 0.05

neutral, otherwise

(2)

where s

Vader

(t) is the sentiment score of the tweet pro-

vided by the Vader lexicon (from Eq. 1).

3.3 Baseline Sentiment

Analysis-Lexicon Feature

The Baseline-Lexicon Sentiment Analysis (BSA

lex

)

feature implies the fact that the input is represented

by the tweet where the hashtags are removed and con-

catenated with the sentiment derived from the Vader

lexicon. It contains only textual information, without

the keywords that deﬁne a Twitter message. For ex-

ample, if there is the tweet ”#Beautiful #goodconcert

this concert was the best from my life #goodvibe feel-

ing awesome,” the enhanced message, considering the

preprocessing step, will be ”concert life feel awesome

positive.”

Therefore, considering a collection of tweets T =

{tweet

,tweet

, ...,tweet

} and a set of labels L =

{positive, negative, neutral}, where n is the number

of tweets and tweet

is a message that contains textual

information and hashtags, the i-th tweet

BSAlex

will be

deﬁned as follows:

BSAlex

= {word

, word

, ....,word

, lex

label

(3)

where

• m is the number of words for the i-th tweet

• lex

label

is a value from the set L, value determined

using the Vader lexicon over the words of a tweet.

3.4 Hashtag Sentiment

Analysis-Lexicon Feature

The Hashtag Sentiment Analysis-Lexicon (HSA

lex

)

perspective deﬁnes as input only the hashtags that are

extracted from a tweet, concatenated with the senti-

ment derived from the Vader lexicon. The input will

be represented by a list of hashtags followed by the

sentiment provided via Vader. Seeing the previously

mentioned tweet ”#Beautiful #goodconcert this con-

cert was the best from my life #goodvibe feeling awe-

some”, the enhanced one will be ”beautiful goodcon-

cert goodvibe positive”. Therefore, considering a col-

lection of tweets T and a set of labels L (similar to the

previous ones), the i-th tweet

HSAlex

will be deﬁned as

follows:

HSAlex

= {hashtag

...., hashtag

, lex

label

}, (4)

where

• p is the number of hashtags for the i-th tweet

• lex

label

is a value from the set L, value determined

using the Vader lexicon over the HSA representa-

tion of tweet

3.5 Fused Sentiment Analysis-Lexicon

Feature

The Fused Sentiment Analysis-Lexicon (FSA

lex

) ap-

proach combines the previous ones. The input for a

classiﬁcation algorithm will be represented by the text

(without hashtags) concatenated with the list of hash-

tags and the sentiment provided by the Vader lexicon.

In other words, if there is the same tweet ”#Beauti-

ful #goodconcert this concert was the best from my

life #goodvibe feeling awesome”, the new one will

be ”concert life feel awesome beautiful goodconcert

goodvibe positive”. Generally, considering T and a

set of labels L (similar to the previous ones), the i-th

FSAlex

will be deﬁned as follows:

FSAlex

= {word

, .., word

, hashtag

, .., hashtag

lex

label

(5)

The Twitter-Lex Sentiment Analysis System

183

where

• p is the number of hashtags for the i-th tweet

• m is the number of words for the i-th tweet

• hash

is the i-th hashtag of the i −th tweet

• lex

label

is a value from the set L, value determined

using the Vader lexicon over the FSA tweet repre-

sentation (words concatenated with hashtags)

3.6 Raw Sentiment Analysis-Lexicon

Feature

The Raw Sentiment Analysis-Lexicon (RSA

lex

) fea-

ture describes the input as a raw text where the # sign

for the hashtags is removed. Additionally, the sen-

timent from the lexicon is appended to the raw text.

If the # sign is removed, then the word becomes an

ordinary one and will be processed like the others in

the preprocessing step. So, for the tweet”#Beautiful

#goodconcert this concert was the best from my life

#goodvibe feeling awesome”, the enhanced one will

be ”beautiful goodconcert concert life goodvibe feel

awesome positive”.

All in all, considering T =

{tweet

,tweet

, ...,tweet

} and a set of labels

L = {positive, negative, neutral}, the i-th tweet

RSAlex

will be deﬁned as follows:

RSAlex

= {word

, ..., word

, hashname

...

hashname

, lex

label

(6)

where

• p is the number of hashtags for the i-th tweet

• m is the number of words for the i-th tweet

• hashname

is the i-th hashtag of the tweet without

the # sign

• lex

label

is a value from the set L, value determined

using the Vader lexicon over the RSA representa-

tion of the tweet

3.7 Overview of the Features

All in all, considering the sentence ”#Beautiful

#goodconcert this concert was the best from my life

#goodvibe feeling awesome”, the following features

can be extracted:

• BSA

lex

: concert life feel awesome positive

• HSA

lex

: beautiful goodconcert goodvibe positive

• FSA

lex

: concert life feel awesome beautiful good-

concert goodvibe positive

• RSA

lex

:beautiful goodconcert concert life good-

vibe feel awesome positive

4 EXPERIMENTS

The experiments are conducted on the previous four

features, considering multiple datasets.

4.1 Data Sets

For the experiments, four datasets are used: Ap-

ple Twitter Sentiment (Pandey et al., 2017), Sanders

dataset (Sanders, 2011), Twitter US Airline (Rane and

Kumar, 2018) and Twitter Climate Change Sentiment

dataset

from the Canadian Foundation for Innova-

tion, University of Waterloo. Bearing that the ap-

proach handles hashtags, all tweets that do not have

hashtags are removed from the datasets. During the

classiﬁcation task, 60% of data is used for training

and 40% for testing.

Apple Twitter Sentiment (Pandey et al., 2017)

dataset has 782 tweets containing the tweet and the

sentiment (71 positives, 142 negatives, and 562 neu-

tral). It contains two attributes: the text (actual tweet

about Apple) and the sentiment (-1 indicates a nega-

tive score, 0 means a neutral one, and one is for posi-

tive sentiment).

Sanders (Sanders, 2011) dataset contains tweets

related to four topics (four big companies): Apple,

Google, Facebook, and Twitter and four sentiments

(positive, negative, neutral, and irrelevant). Since we

handle only three polarities, the irrelevant tweets are

removed. So, 2819 are used for the experiments: 519

positives, 572 negatives, and 1728 neutrals. The col-

lection contains the following attributes: topic, sen-

timent, tweet id, tweet date, and tweet text (original

message).

The Twitter US Airline (Rane and Kumar, 2018),

has 2402 tweets related to messages posted in 2015

about United States airlines. It has 436 positive mes-

sages, 1551 negative, and 415 neutral ones. Further-

more, the dataset handles features like tweet id, air-

line sentiment (positive, negative, or neutral), text,

tweet created (when the user posted the message),

user timezone, or tweet location.

The last dataset, the Twitter Climate Change Sen-

timent has tweets collected between the 27th of April

2015 and the 21st of February 2018 containing four

polarities: pro-climate change, anti-climate change,

neutral, and links (the tweet is a link that only presents

news about climate change). The links are removed

from the dataset, so 6711 tweets are used for the ex-

periments: 5005 positives (meaning that the message

is ”pro-climate change”), 599 negatives (”anti-climate

https://www.kaggle.com/edqian/twitter-climate-

change-sentiment-dataset

KDIR 2022 - 14th International Conference on Knowledge Discovery and Information Retrieval

184

change”), and 1107 neutral. The dataset includes at-

tributes like the message, tweet id, or sentiment (en-

coded as -1 for negative, 1 for positive, and 0 for neu-

tral).

4.2 Results

Even though we applied three classiﬁers for the nu-

merical experiments, the best results were obtained

using the Logistic Regression (LR) algorithm. There-

fore, the output is presented only for this technique.

Tables 1,2, 3 and 4 present the average accuracy and

precision along with the 95% conﬁdence intervals,

for accuracy, (CI

acc

) for all features (BSA-Lex, HSA-

Lex, FSA-Lex and RSA-Lex) and datasets (Apple

Sentiment, Sanders, Twitter US Airline, and Twitter

Climate Change Sentiment).

Table 1: Average Accuracy.

Dataset BSA

lex

HSA

lex

FSA

lex

RSA

lex

Apple 84.08% 82.80% 86.62% 84.62%

Sanders 78.37% 74.11% 79.79% 79.61%

Airline 76.72% 73.60% 77.75% 78.17%

Climate 78.78% 76.92% 78.11% 83.22%

Table 2: Average Precision.

Dataset BSA

lex

HSA

lex

FSA

lex

RSA

lex

Apple 83.36% 78.12% 85.63% 83.64%

Sanders 76.21% 67.51% 76.81% 75.17%

Airline 69.76% 67.46% 73.52% 74.91%

Climate 73.73% 69.19% 70.18% 82.24%

Table 3: 95% CI

acc

for BSA

lex

& HSA

lex

Dataset BSA

lex

HSA

lex

Apple (0.815, 0.866) (0.802, 0.854)

Sanders (0.769, 0.799) (0.725, 0.757)

Airline (0.750, 0.784) (0.718, 0.754)

Climate (0.778, 0.798) (0.759, 0.779)

Table 4: 95% CI

acc

for FSA

lex

& RSA

lex

Dataset FSA-Lex RSA-Lex

Apple (0.842, 0.890) (0.821, 0.871)

Sanders (0.783, 0.813) (0.781, 0.811)

Airline (0.761, 0.794) (0.765, 0.798)

Climate (0.771, 0.791) (0.823, 0.841)

The results highlight some ideas drawn around

two directions: features and datasets.

In terms of datasets, the following conclusions are

depicted:

• Apple dataset produces the best results, but it is a

tiny and un-balanced dataset;

• the worst values are obtained for the Airline

dataset with many negative tweets. A potential

cause for these results is that our system does not

handle the special case of negation. So, further

improvements are required for the preprocessing

phase to reach better results;

• FSA

lex

feature is the best for Apple and Sanders

dataset, while the RSA

lex

ﬁts the remaining (Twit-

ter US Airline and Twitter Climate Change).

From the feature point of view, the next outcome

is described:

• HSA

lex

feature produces the worst results, which

illustrates that using stand-alone hashtags does

not bring value since they are only indicators that

lose their power without a tweet. Even though the

hashtags are enriched with the lexicon feature, it

seems that the sentiment feature from Vader is not

enough to increase the classiﬁcation’s quality.

• BSA

lex

feature has better results than HSA

lex

but

not as good as the FSA

lex

and RSA

lex

. This anal-

ysis leads us to the idea that cleaning the original

tweet by removing the hashtags does not boost the

polarity of the message.

• RSA

lex

and FSA

lex

have the best results. The val-

ues are very similar in 3 out of 4 cases (only for

Twitter Climate Change, there are essential dif-

ferences between features). Therefore, combining

text, hashtags, and the lexicon feature is the best

mix and produces valuable information. Due to

the very close values, we cannot say that the order

of hashtags and text played an important role. So,

more complex experiments are needed to clarify

this aspect. Although the # sign is removed for the

RSA

lex

feature, the hashtags still play an essential

role within the message since they are indicators

within the tweet.

5 COMPARISONS

5.1 Comparison with Original Feature

The best values reached by the four designed ap-

proaches will be compared with the original tweet

(only the preprocessed tweet), without another kind

of enhancements, by applying the Logistic Regression

classiﬁer.

The tables 5, 6, 7 and 8 present comparison be-

tween the four deﬁned features and the original tweet.

All features are better than the original one for the

Apple Twitter dataset. For the remaining datasets, the

original feature is better than the HSA

lex

, strength-

ening the idea that stand-alone hashtags are not key

The Twitter-Lex Sentiment Analysis System

185

Table 5: Apple dataset.

Method Acc Prec

BSA

lex

84.08% 83.36%

HSA

lex

82.80% 78.12%

FSA

lex

86.62% 85.63%

RSA

lex

84.62% 83.54%

Original feature 77.66% 70.92%

Table 6: Sanders dataset.

Method Acc Prec

BSA

lex

78.37% 76.21 %

HSA

lex

74.11% 67.51%

FSA

lex

79.79% 76.81%

RSA

lex

79.61% 75.17%

Original feature 78.37% 74.48%

Table 7: US Airline dataset.

Method Acc Prec

BSA

lex

76.72% 69.76%

HSA

lex

73.60% 67.46%

FSA

lex

77.75% 73.52%

RSA

lex

78.17% 74.91%

Original feature 73.85% 65.46%

Table 8: Twitter Climate Change dataset.

Method Acc Prec

BSA

lex

78.78% 73.73%

HSA

lex

76.92% 69.19%

FSA

lex

78.11% 70.18%

RSA

lex

83.22% 82.24%

Original feature 78.03% 70.08%

players in the classiﬁcation task. Then, we can con-

clude that the Twitter-Lex SA system outperforms in

comparison with the original approach when a mes-

sage is only preprocessed.

5.2 Comparison with Related Work

Even though there is a different setup for the exper-

iments, our results are compared with the approach

from (Rani et al., 2021) since both are using the Twit-

ter US Airline dataset. Rani’s perspective appends

the sentiment score from a lexicon to each feature

set (attributes from the dataset) deﬁned based on dif-

ferent feature selection techniques (Information gain

and Chi-square). Another difference is that the sys-

tem uses several sizes of features. Table 9 presents

the best accuracy value for our features (RSA

lex

) and

the best results for the approach that uses Machine

learning classiﬁers (302 features) deﬁned from (Rani

et al., 2021).

Table 9: US Airline dataset- Comparison with related work.

Twitter-Lex Rani (302 features)

78.17% 78.41%

Because there are considerable differences in

terms of used features, we added the smallest feature

size reported in the related work. The results show

similar values, indicating that our approach can be

extended and validated against other directions from

literature (using the same datasets, classiﬁers, etc.).

6 CONCLUSION AND FUTURE

WORK

The Twitter-Lex Sentiment Analysis system deﬁnes

four features: a combination of a lexicon feature from

the Vader library and a Twitter-speciﬁc one (hash-

tags). The experimental results show that the pro-

posed framework outperforms the views when the

Sentiment Analysis process uses the original tweet.

Also, the best achievements are obtained for RSA

lex

and FSA

lex

directions.

Our next plan implies considering multiple and di-

verse datasets and using more features from different

lexicons (e.g., Senti Word Net or Text Blob). Another

interesting viewpoint will be to explore hybridization

and fusion techniques for extending the Twitter-Lex

system. Moreover, we would like to compare our ap-

proach with other interesting methods from the more

complex literature and evaluate the results from a sta-

tistical point of view. In addition, the process can

be enhanced with more speciﬁc Twitter features like

retweets, replies, or mentions. So, there are still many

interesting things to do, but the designed features of-

fer promising results.

REFERENCES

Antonakaki, D., Fragopoulou, P., and Ioannidis, S. (2021).

A survey of twitter research: Data model, graph struc-

ture, sentiment analysis and attacks. Expert Systems

with Applications, 164:114006.

Ayyub, K., Iqbal, S., Munir, E. U., Nisar, M. W., and

Abbasi, M. (2020). Exploring diverse features for

sentiment quantiﬁcation using machine learning algo-

rithms. IEEE Access, 8:142819–142831.

Baeza-Yates, R., Ribeiro-Neto, B., et al. (1999). Modern

information retrieval, volume 463. ACM press New

York.

Carvalho, J. and Plastino, A. (2021). On the evaluation

and combination of state-of-the-art features in twit-

ter sentiment analysis. Artiﬁcial Intelligence Review,

54(3):1887–1936.

KDIR 2022 - 14th International Conference on Knowledge Discovery and Information Retrieval

186

Chiong, R., Budhi, G. S., and Dhakal, S. (2021). Com-

bining sentiment lexicons and content-based features

for depression detection. IEEE Intelligent Systems,

36(6):99–105.

Hutto, C. and Gilbert, E. (2014). Vader: A parsimonious

rule-based model for sentiment analysis of social me-

dia text. In Proceedings of the International AAAI

Conference on Web and Social Media, volume 8.

Kleinbaum, D. G., Dietz, K., Gail, M., Klein, M., and Klein,

M. (2002). Logistic regression. Springer.

Koto, F. and Adriani, M. (2015). A comparative study on

twitter sentiment analysis: Which features are good?

In International Conference on Applications of natu-

ral language to information systems, pages 453–457.

Springer.

Limboi, S. and Dios¸an, L. (2020). Hybrid features for twit-

ter sentiment analysis. In International Conference

on Artiﬁcial Intelligence and Soft Computing, pages

210–219. Springer.

Onan, A. (2021). Ensemble of classiﬁers and term weight-

ing schemes for sentiment analysis in turkish. Scien-

tiﬁc Research Communications, 1(1).

Pandey, A. C., Rajpoot, D. S., and Saraswat, M. (2017).

Twitter sentiment analysis using hybrid cuckoo search

method. Information Processing & Management,

53(4):764–779.

Rane, A. and Kumar, A. (2018). Sentiment classiﬁcation

system of twitter data for us airline service analysis. In

2018 IEEE 42nd Annual Computer Software and Ap-

plications Conference (COMPSAC), volume 1, pages

769–773. IEEE.

Rani, S., Gill, N. S., and Gulia, P. (2021). Analyzing impact

of number of features on efﬁciency of hybrid model of

lexicon and stack based ensemble classiﬁer for twitter

sentiment analysis using weka tool. Indonesian Jour-

nal of Electrical Engineering and Computer Science,

22(2):1041–1051.

Sanders, N. J. (2011). Sanders-twitter sentiment corpus.

Sanders Analytics LLC, 242.

Shen, G., Jia, J., Nie, L., Feng, F., Zhang, C., Hu, T.,

Chua, T.-S., and Zhu, W. (2017). Depression detection

via harvesting social media: A multimodal dictionary

learning solution. In IJCAI, pages 3838–3844.

Suthaharan, S. (2016). Support vector machine. In Machine

learning models and algorithms for big data classiﬁ-

cation, pages 207–235. Springer.

Webb, G. I., Keogh, E., and Miikkulainen, R. (2010). Na

ıve

bayes. Encyclopedia of machine learning, 15:713–

714.

The Twitter-Lex Sentiment Analysis System

187