For more details, see table2 , which summarises
the results of each dataset individually.
Table 2: Best classification results of each dataset.
Accuracy F-measure
wacht7ass PV+SVM+TF-IDF 81,63 % 84,47 %
G-Form PVT+Naive bayes+TF-IDF 80,87 %
G-Form PVT+SVM+TF 78,32 %
Brandt PVT+SVM+TF-IDF 94,24 % 90,67 %
In this paper, we presented a supervised approach for
sentiment analysis in Algerian dialect written in Latin
script, which gave interesting results despite the many
specific aspects of the dialect and complexity of Ara-
bizi analysis. We report results from an extensive
empirical evaluation assessing the effects of classi-
fiers, the effects of presentation types (count, TF, TF-
IDF) and those of novel contributions in preprocess-
ing phase, notably, vowels removing. Three data sets
were annotated with their respective sentiment labels
using crowdsourcing in this experiment. We achieved
an F-score of 87 % and an accuracy of 83 % using
this approach. Results revealed also that SVM out-
performs the other classifiers. Finally, the preprocess-
ing allowed us to impove f-score of SVM by 9,20 %,
which is considerable and shows the relevance of our
prior premises.
Our work can be improved in various directions. First,
we will test other models (random forest, gradient-
boosted trees, Latent Dirichlet Allocation model). We
could also explore other characteristics and feature
such as emoji interpretation and Irony/Sarcasm detec-
tion or other areas of opinion mining field, notably,
subjectivity analysis and rumor detection.
