English tweets using VADER, SentiWordNet,
AFINN, and TextBlob. The sentiment scores were
classified into three groups namely: positive,
negative, and neutral. The results showed the
VADER lexicon produced the best performance in
terms of accuracy and computational efficiency.
Deep sentiment analysis was done by
collaborating an unsupervised topic model and deep
learning model based on Long Short-Term Memory
(LSTM) Recurrent Neural Network (RNN) (Jelodar
et al., 2020). The data gathered from sub-reddits to
analyses 563,079 COVID-19–related comments in
English. This research used LDA Topic model and
Gibbs sampling for semantic extraction and latent
topic discovery. The results showed those methods
achieved 81.15% accuracy, which was higher than
traditional machine learning algorithms.
Sentiment analysis was done by combining the
supervised and unsupervised machine learning
methods (El Rahman et al., 2019). This research used
data in English from Twitter for two subjects: 7,000
tweets for McDonald's and 7,000 tweets for KFC. The
unsupervised algorithm was used to label data. The
supervised algorithm: NB, SVM, Maximum Entropy
(MaxEnt), DT, RF, and Bagging, were used to
classify data. The results showed that the MaxEnt had
the highest accuracy.
The performances of five supervised
classification methods were compared for sentiment
analysis (Renault, 2020). These methods include NB,
MaxEnt, Linear Support Vector Classifier, RF, and
MLP. This research used two datasets in English: one
balanced dataset containing 500,000 positive
messages and 500,000 negative messages, and one
unbalanced dataset containing 800,000 positive
messages and 200,000 negative messages. The results
showed that more complex algorithms were not
increase the classification accuracy, where the simple
algorithms like NB and MaxEnt might be sufficient
to derive sentiment indicators.
Sentiment analysis was done using NB and the
Lexicon dictionary for Twitter (Rasool et al., 2019).
The data used were 99,850 tweets by using the
apparel brand's name: "Nike" and "Adidas" in
English. The results showed that Adidas had more
positive sentiment than the Nike.
Sentiment analysis was used to predict and
analyse the Presidential election in Indonesia used
Twitter AP (Budiharto & Meiliana, 2018). Data
gathered from four survey institutes in Indonesia.
This research used the training set with 250 tweets,
and the test set 100 tweets. The results showed that
this method was a way simpler than other methods yet
proved to be sufficient to produce a reliable result.
The performances of five supervised
classification methods were compared for sentiment
analysis (Al-Amrani et al., 2017). These methods
include PART, DT, NB, Logistic Regression, and
SVM. Data was taken from the "SMS Spam
Collection Data Set" which contained 5,574 SMS
divided into two types: positive and negative in
English. The results showed that Logistic Regression
had the highest number of correctly classified
instances followed by SVM, NB, PART and DT.
Sentiment analysis was done using TF-IDF and
some functions in R (Widyaningrum et al., 2019). The
data used were 2,352 tweets in English. The score
process resulted in negative sentiment was 323 and
positive was 1,543. The comparison ratio between the
positive and negative opinions on the overall
approach was 4.78.
Sentiment analysis was done by comparing word
embedding and TF-IDF as the feature extraction
methods for three classification models: deep neural
networks (DNN), Convolutional Neural Networks
(CNN), and Recurrent Neural Networks (RNN)
(Dang et al., 2020). The data used was eight datasets
contained tweets in English. The results showed that
DNN technique with word embedding better than
with TF-IDF, and CNN outperformed other models,
presenting a good balance between accuracy and CPU
runtime.
Research related to hotel sentiment analysis was
done with the Naïve Bayes Multinomial method
(Farisi et al., 2019). The research data was taken from
the Business Data Database consisting of 5,000
sentences in English divided into 3,946 sentences
labelled 1 (positive) and 1,053 sentences labelled 0
(negative). The results showed the accuracy value
achieved was F1-Score an average of 91.4%.
Research related to travel agent sentiment analysis
was done with the KNN, NB and SVM (Poernomo &
Suharjito, 2019). The research data was taken from
the OTA application: Traveloka, Agoda, and Tiket,
with 70% of training data and 30% of test data in
Indonesian. The results showed the KNN method had
the best accuracy of 96.32%.
From the description above, it can be concluded
that TF-IDF were implemented on SVM have the best
performance compared to other techniques. However,
that was carried out on English texts. As for the text
in Indonesian, the performance of SVM and TF-IDF
was stated to be good too, but it was not compared to
other classifying techniques. Therefore, in this paper,
the performance of TF-IDF on classifying process
was compared on five classifying techniques for the
analysis of text sentiments in Indonesian.