feedback when planning trips (Pelsmacker et al.,
2018).
The contribution of this paper is the study of the
feasibility of a big data web application (Boulaalam,
Oumayma, et al, 2018) that analyses and classifies
opinions (negative, positive) based on tourists'
feedbacks about hotels, museums, and restaurants in
the Draa-Tafilalet (K.AL Fararni, 2021).
The remainder of the paper is structured as
follows: Section 2 proposes a short literature review;
Section 3 presents the methodology we used in our
model; in section 4, we display the results; in section
5, we present our application briefly, and section 6
concludes the paper and displays future work.
2 LITERATURE REVIEW
Sentiment analysis, also referred to as opinion mining
or emotion AI, is a set of analytic methods that aims
to extract knowledge embedded in subjective text on
the internet. Those methods can be categorized into
three categories: the lexicon-based methods, the
machine learning methods, and the deep learning
methods.
The lexicon-based methods are based on the use
of a well-rendered sentiment lexicon to determine the
text polarity. These methods include the dictionary-
based approaches, the corpus-based, and the manual
approaches.
The dictionary-based approach consists of
building from a small number of sample sentiments
in which polarity was manually set. The number of
words iteratively increases using a well-known
lexicon, e.g., WordNet (
Miller George A et al.,1990).
Like the dictionary-based one, the corpus-based
approach starts with a small set of manually
calibrated sentiment words. The number of words
then increases utilizing a large corpus and following
a set of predetermined rules and formulas like the
LDA and the PMI. Unlike the dictionary-based
approach, the Corpus-based ones can link words to
context; in other words, the word polarity depends on
the context, not on a predefined value. The manual-
based approach relies on the manual collection and
labeling of the lexicon, which requires a significant
amount of human effort and time. (Qiu et al., 2010)
ameliorated the targeted advertising strategy using a
dictionary-based approach. The presented approach
extracted topics and opinions from sentences, which
helped consumers' attitude identification towards
topics, resulting in more accurate advertising. (Rajput
and Haider, 2016) performed a lexicon-based
sentiment analysis on students' evaluations of
professors at the end of a course. The presented
approach is based on using a dictionary to compute
the sentiment score of each feedback; this metric is
then used to determine if the feedback is positive,
negative, or mixed.
The machine learning methods capitalize on
classification methods, supervised and unsupervised,
to determine the textual content polarity. Those
methods exploit the bag of words (BOW), part of
speech, the n-gram feature, and the TF-IDF model.
Among the various classification strategies for
detecting users' emotions from their text: SVM,
LDA, Linear regression, Naïve Bayes, and
artificial neural networks are more common and
achieve the highest performance. (Nikhil Kumar
Singh et al., 2020) compared different algorithms
(SVM, LR, NB, RF) with multiple feature extraction
techniques (BOW, POS, Hash Tagging) on two
databases: the Twitter sentiment corpus data set and
the Stanford data set. The results showed that SVM
and NB with POS were on top with, respectively,
83.27% and 83.13% on the first data set and 81.34%
and 80.12% on the second one. (Fang Luo et al.,
2016) proposed a feature selection algorithm,
CHIsquare Difference between the Positive and
Negative Categories (CDPNC), that uses both
Document Frequency (DF) and Chi-Squared (CHI).
This method was tested with three algorithms: SVM,
KNN, and NB. The experimental results show that the
classification efficiency of the proposed system
outperforms the state-of-the-art, especially when used
with SVM.
The deep learning methods use deep network
architectures to output text polarity. The strength of
this approach comes from the use of advanced word
embedding tools like word2vec and GloVe
(
Pennington Jeffrey et al., 2014). The RNN class of
neural networks was the obvious choice in the context
of sequential data like text. However, recent research
showed that combining RNN and CNN classes results
in better performance (
Behera Ranjan Kumar et al.,
2021)
. (Zhou et al.,2016) have proposed two bi-
directional LSTM models for sentiment analysis. The
first one combines biLSTM and a two-dimensional
pooling (BLSTM-2DPool), the second one combines
biLSTM and a two-dimensional Convolution Layer
(BLSTM-2DCNN). The BLSTM-2DCNN did not
only outperform RecNN, RNN, and CNN models but
also the BLSTM-2DPool on multiple databases,
namely Stanford Sentiment Treebank (SST), TREC,
and 20Newsgroups (20Ng). In 2018, (Weihang
Huang et al.,2018) proposed a document-level
sentiment analysis model SSR-LSTM. This model
first removes sentences with weak emotions and then