the dataset contains two classes; the Hate class and
the Normal class. The word embedding technique is
used for extracting a set of words features that can
capture the hidden relations of words of the dataset.
The utilized word embeddings are the Word2Vec and
the AraVec implementations. Keras (Gulli and Pal,
2017) is a deep learning framework used for the im-
plementation of the deep learning model. The pro-
posed deep learning model is a recurrent convolu-
tional network, which is a combination of convolu-
tional network layer and LSTM network. The pro-
posed methodology is evaluated based on different
performance evaluation measures including the accu-
racy, precision, recall, and F1 measure. The results of
detecting hate tweets based on deep learning frame-
work were very good. Yet promising for further re-
search as the problem of cyber hate speech detection
over Arabic countries is poorly investigated.
The rest of the paper is organized as follows. Sec-
tion 2 gives a review of related works of cyber hate
speech detection. Section 3 is the proposed method-
ology including a description of the dataset, the word
embedding and deep learning frameworks, in addition
to the evaluation performance metrics. Section 4 is a
discussion of the conducted experiment and obtained
results. Finally, Section 5 is the concluding remarks
and potential future works.
2 RELATED WORKS
Recently, Arabic natural language processing has
been sparsely studied. Yet, cyber hate speech detec-
tion in Arabic context is poorly investigated. How-
ever, this section reviews previous research studies for
cyber hate speech detection in Arabic context.
Authors in (Al-Hassan and Al-Dossari, 2019)
presented the main challenges for articulating hate
speech over Arabic online social networks. Where
they stated that the colloquial Arabic has many gram-
matical and spelling mistakes. Also, in some Arabic
countries there are words considered hate, while in
other Arabic countries they are normal. Further, au-
thors claimed that all conducted studies in this area
suffer of low recall and precision values. One of
the early attempts for detecting hate speech of Ara-
bic tweets can be found in (Abozinadah et al., 2015),
in which, three types of features were extracted that
are profile-based features, tweet-based features repre-
sented by Term Frequency (TF) and Term Frequency-
Inverse Document Frequency (TF-IDF) models, and
social graph features. Their proposed approach im-
plemented traditional machine learning algorithms in-
cluding Naive Bayes (NB), Support Vector Machine
(SVM), and decision tree (DT), where it achieved
very good performance in terms of recall, precision,
and f-measure.
In (Mubarak et al., 2017), authors created a corpus
of cyberbullying words, which used for abusive lan-
guage detection over Arabic social media. Whereas,
(Alakrot et al., 2018) constructed a dataset for of-
fensive speech detection on YouTube, covering sev-
eral Arabic dialects. The constructed dataset encom-
passes three classes; offensive, inoffensive, and neu-
tral. Additionally, (Albadi et al., 2018) constructed a
dataset for religious hate speech detection over Arabic
Twitter environment. In the proposed approach, au-
thors developed an Arabic lexicon that contains com-
monly used religious terms with their polarities. Also,
the constructed dataset is applied into different clas-
sification models involving a lexicon-based, N-gram-
based, and a deep learning-based approach. Where
the implementation of a Recurrent Neural Network
with Gated Recurrent Unit and a pre-trained word em-
bedding model achieved (84%) in terms of Area Un-
der Curve measure (AUC).
Nonetheless, (Mulki et al., 2019) collected a Twit-
ter dataset for hate speech and abusive language de-
tection in Arabic context. The created dataset is a
benchmark dataset known as (L-HSAB). The pro-
posed dataset is classified into three classes; normal,
abusive, and hate. However, authors applied the N-
gram and TF word representation models into SVM
and NB classifiers, where the results were very good
in terms of accuracy, recall, precision, and f-measure.
Similarly, (Haddad et al., 2019) designed a dataset for
hate speech detection for Tunisian dialect, which aims
for an automatic prevention of any toxic language.
While (Bleiweiss, ), presented an LSTM approach
based transfer learning for abusive speech detection
on Twitter, which accomplished very good results in
regard to F-measure.
To the best of our knowledge, very few studies in-
vestigated cyber hate speech detection on Arabic on-
line context. Even that, cyber hate speech detection is
also presented in other different languages such as En-
glish, Italian, and Indonesian. For instance, (Watan-
abe et al., 2018) proposed an approach for hate speech
detection on Twitter, in which, unigram and senti-
ment features were extracted and fed into SVM, DT,
and Random forest (RF), achieving good performance
regarding accuracy, recall, precision, and f-measure.
While (Pitsilis et al., 2018) presented an ensemble
approach of recurrent neural networks, for the detec-
tion of racism and sexism on Twitter. In (Del Vi-
gna12 et al., 2017), authors introduced an approach
for Italian hate speech detection on Facebook, where
several syntactical, sentimental, and word embedding
ICPRAM 2020 - 9th International Conference on Pattern Recognition Applications and Methods
454