Authors:
Anis Charfi
1
;
Andria Atalla
1
;
Raghda Akasheh
1
;
Mabrouka Bessghaier
1
and
Wajdi Zaghouani
2
Affiliations:
1
Carnegie Mellon University in Qatar, Education City, Doha, Qatar
;
2
Hamad Bin Khalifa University, Education City, Doha, Qatar
Keyword(s):
Natural Language Processing, Hate Speech, Arabic Language, Dialectal Arabic, Annotation, Arabic Corpus.
Abstract:
A significant issue in today’s global society is hate speech, which is defined as any kind of expression that attempts to degrade an individual or a society based on attributes such as race, color, nationality, gender, or religion (Schmidt and Wiegand, 2017). In this paper, we present a Web-based hate speech detection system that focuses on the Arabic language and supports its various dialects. The system is designed to detect hate speech within a given sentence or within a file containing multiple sentences. Behind the scenes, our system makes use of the AraBERT model trained on our ADHAR hate speech corpus, which we developed in previous work. The output of our system discerns the presence of hate speech within the provided sentence by categorizing it into one of two categories: ”Hate” or ”Not hate”. Our system also detects different categories of hate speech such as race-based hate speech and religion-based hate speech. We experimented with various machine learning models, and our
system achieved the highest accuracy, along with an F1-score of 0.94, when using AraBERT. Furthermore, we have extended the functionality of our tool to support inputting a file in CSV format and to visualize the output as polarization pie charts, enabling the analysis of large datasets.
(More)