models: CNN, the LSTM, and GRU as shown in
Table 4.
The epoch value was selected to be 50 because it
showed the highest results among the randomly
chosen values. This value was also selected by
(Lehečka et al., 2020) when they classified large-
scale multi-label Wikipedia datasets. Moreover, the
value of the batch size was set to be 32 because it is
an adequate default value according to (Garcia-Silva
et al., 2020). Furthermore, the value of the learning
rate was selected to be 5e-5, which matches the value
selected by (Sun et al., 2019). The stride value was
selected to be 1, which was the same value selected
by (Srivastava et al., 2020), and it is the most popular
value for the stride (Togashi et al., 2018). Moreover,
ReLU (Rectified Linear Unit) was adopted as an
activation function. ReLU was also picked by (Nie et
al. 2020).
The results shown in Table 4 reflect that the
combination of the BERT word embedding model
with the LSTM model surpassed the combination of
the BERT word embedding model with the other deep
learning models by scoring 99.48% in F1-scores. In
general, the combination of the BERT word
embedding model with the deep learning models used
in this research generated quite higher F1 scores than
those achieved by the machine learning classifiers
employed in this research.
Table 4: The F1-Scores Resulted From Combining BERT
Word Embedding Model with a Set of Deep Learning
Models.
Deep Learning
Model
BERT
F1-Score
CNN 99.19%
LSTM 99.48%
GRU 94.73%
BERT 98.22%
9 DATA VISUALIZATION
There is a set of sentiment words and clauses that the
classifiers depend on to determine the polarity of the
classification. For instance, when the Bernoulli Naïve
Bayes classifier analysed the sentiments in the
dataset, the following sets of negative and positive
words and clauses assisted in determining the
sentiment polarity of the text. As stated above, some
of these sentiment words and clauses (such as recall,
of power, and head gasket, among others) are rarely
used outside of the automobile domain. This justifies
limiting the scope of this research to automobile data
as general sentiment analyzers will arguably not be
able to classify these words and clauses. Some of
these words and clauses are illustrated in Figure 1 and
also shown below:
Positive words and clauses:
['more fun', 'world', 'sweet', 'much good', 'it very', 'be
good', 'be great', 'be much', 'white', 'genesis',
'beautiful', 'best', 'best car', 'excellent', 'awesome',
'nice', 'overall', 'look great', 'comfy', 'one of most', 'one
of best', 'very nice', 'car but', 'safe', 'very comfortable',
'of power', 'fantastic', 'fine', 'of best', 'fun drive'].
Negative words and clauses:
['car but', 'terrible', 'not like', 'crap', 'break',
'unreliable', 'crappy', 'fall', 'noise', 'failure',
'uncomfortable', 'awful', 'ugly', 'certain', 'recall', 'po',
'pricey', 'fail', 'but be', 'and not', 'more expensive', 'but
not', 'gasket', 'horrible', 'head gasket', 'worst', 'hate',
'hat', 'weak', 'poor'].
10 CONCLUSION AND FUTURE
WORK
Despite the improvements suggested for future work
outlined below, it is evident that the methodology
used in this research successfully filled the gaps that
were left unaddressed by other contributions
regarding the analysis of sentiments that exist in
reviews in the automobile domain. In terms of the
results, the combination of the BERT word
embedding model with the LSTM model had the
highest F1-score, which reflects an opportunity for
researchers to adopt such a combination to analyse
the sentiments in English automobile data in
particular, and non-English automobile data in
general. The methodology adopted in this research
has also shown superior F1 scores when compared
with the scores achieved by other works that were
reviewed in this paper.
In future work, adopting advanced models such as
reinforcement learning, ERNIE, and Elmo could
enhance the results and widen the scope of this area
of research. The results could also be improved by
enlarging the dataset, treating negations, and
developing specific word embedding models that are
more related to the automobile industry in terms of
the embedded vocabulary. The scope of the work
could also be broadened by covering the semi-
supervised approaches (Alsafari et al., 2021). Finally,
performing an aspect-based sentiment analysis would
result in more precise sentiment analysis for the
opinion target.