Authors:
Aillkeen de Oliveira
1
;
Cláudio Baptista
1
;
Anderson Firmino
1
and
Anselmo Cardoso de Paiva
2
Affiliations:
1
Federal University of Campina Grande, Rua Aprigio Veloso, 882 - Universitário, Campina Grande, Paraiba, Brazil
;
2
Federal University of Maranhão, Av. dos Portugueses, 1966 - Vila Bacanga, São Luís, Maranhão, Brazil
Keyword(s):
Hate Speech Detection, Natural Language Processing, Cross-Lingual Learning.
Abstract:
In the Internet age people are increasingly connected. They have complete freedom of speech, being able to
share their opinions with the society on social media. However, freedom of speech is often used to spread
hate speech. This type of behavior can lead to criminality and may result in negative psychological effects.
Therefore, the use of computer technology is very useful for detecting and consequently mitigating this kind
of cyber attacks. Thus, this paper proposes the use of a state-of-the-art model for detecting political-related
hate speech on social media. We used three datasets with a significant lexical distance between them. The
datasets are in English, Italian, and Filipino languages. To detect hate speech, we propose the use of a PreTrained Language Model (PTLM) with Cross-Lingual Learning (CLL) along with techniques such as ZeroShot (ZST), Joint Learning (JL), Cascade Learning (CL), and CL/JL+. We achieved 94.3% in the F-Score
metric using CL/JL+ strategy with t
he Italian and Filipino datasets as the source language and the English
dataset as the target language.
(More)