and mining different aspects of a tweet. The fea-
tures derived from tweets are text-based, hashtag-
based, text concatenated with hashtags, and text
with hashtags without the # indicator. All of them
are merged with the sentiment label provided by
Vader lexicon;
• a complex study of the enhancement consisted of
applying the whole process on several datasets
and considering multiple scenarios.
The remainder of the paper is organized as fol-
lows. Section two presents various approaches that
use several features or analyze the impact of features
for Twitter Sentiment Analysis. The overview of the
system is highlighted in section three, followed by the
fourth one that describes in detail the proposed ap-
proach. The numerical experiments and the results
are reflected in the fifth section. Then, a comparison
between viewpoints is made in the sixth one. Conclu-
sions and future work are drawn in the last area.
2 RELATED WORK
Developing systems for Sentiment Analysis and eval-
uating the high impact of feature sets or different
types of inputs/ data models was approached in var-
ious ways in the literature. Therefore, this section
presents several perspectives that focus on the fea-
tures and models used for the Twitter polarity clas-
sification problem.
The approach from Chiong (Chiong et al., 2021)
aims to detect the depression hidden in tweets. The
posted messages are analyzed based on a combina-
tion of features. This mix consists of components re-
sulting from the sentiment lexicon and content-based
Twitter-specific features. The data sets from Shen’s
and Eye’s perspectives (Shen et al., 2017) are used
for the methodology. Tweets are marked as indi-
cating ”Depression” (negative sentiment) or ”Non-
depression” (positive view). Six feature groups are
defined for the depression detection task. Three
groups contain features based on the sentiment lexi-
cons, and three use platform-specific features. So, the
first three groups (A, B, and C) have attributes from
SentiWordNet and SenticNet libraries (e.g., number
of positive, negative, or neutral words). The remain-
ing groups have basic tweet information (e.g., the
number of words, the number of links), part-of-speech
(POS) features, and linguistic attributes (e.g., the ratio
of adverbs and adjectives, school-level indicator for
text understanding, etc.). After the feature extraction
process, data is split into training and test and passed
to four different classifiers: Support Vector Machine,
Logistic regression, Decision Tree, and Multilayer
Perceptron. The best classifier is detected based on
evaluation measures (accuracy, precision, recall, and
f-score).
The aim of Rani’s perspective (Rani et al., 2021)
is to analyze the impact of features’ size on a senti-
ment classification for the Twitter US Airline dataset.
Moreover, the feature selection technique is examined
to see what method best fits a polarity detection prob-
lem. The designed system collects the messages and
applies cleaning and preprocessing techniques. Af-
ter this phase, Chi-Square and Information Gain are
used as feature selection techniques for defining fea-
ture sets with different dimensions. In addition, a
sentiment score is added to each feature set by using
a sentiment lexicon. The enhanced model is passed
to various machine learning classifiers (Na
¨
ıve Bayes,
SVM, or decision trees), and the results are evaluated
via accuracy and Kappa metric.
The approach from (Ayyub et al., 2020) applies
Sentiment Analysis to determine the ”relative fre-
quency” of a sentiment label, called ”sentiment quan-
tification.” This methodology is divided into two main
phases: sentiment classification task and computing
the frequency of the target class, also known as the
class of interest. The analysis aims to determine the
impact of linguistic features on the whole process
and compare different classification techniques. The
designed system handles three types of feature ex-
traction methods. Firstly, the bag of words is con-
verted into TF-IDF values. The second approach
uses n-grams (here, words have assigned probabili-
ties). The last experiment involves the combination of
the two methods. Standford Twitter Sentiment, STS-
Gold, and Sanders are used as datasets. The experi-
ments handle different feature sets based on the previ-
ously mentioned techniques and use different classi-
fiers such as traditional machine learning approaches
or deep learning. Moreover, absolute error or relative
error are determined as evaluation measures.
Onan et al. (Onan, 2021) explores the sentiment
classification issue for Turkish tweets. In addition,
it analyzes different word embedding-based features
using supervised learning algorithms (e.g., Na
¨
ıve
Bayes, SVM) and ensemble learning techniques (e.g.,
AdaBoost, Random Subspace). The proposed system
defines nine weighting schemes. Two are unsuper-
vised (TF-IDF or term frequency). Seven are super-
vised: odds ratio, relevance frequency, balanced dis-
tributional concentration, inverse question frequency-
question frequency-inverse category frequency, short
text weighting, inverse gravity moment, and regular-
ized entropy (Onan, 2021). Tweets were collected for
two months via Twitter API to build the data set. A
manual annotation phase determines if a message is
The Twitter-Lex Sentiment Analysis System
181