Authors:
Lucas Cabral
1
;
José Maria Monteiro
1
;
José Wellington Franco da Silva
1
;
César Lincoln Mattos
1
and
Pedro Jorge Chaves Mourão
2
Affiliations:
1
Computer Science Department, Federal University of Ceará, Fortaleza, Ceará, Brazil
;
2
State University of Ceará, Fortaleza, Brazil
Keyword(s):
Misinformation Detection, Fake News Detection, Natural Language Processing, WhatsApp, Social Media.
Abstract:
In the past few years, the large-scale dissemination of misinformation through social media has become a critical issue, harming the trustworthiness of legit information, social stability, democracy and public health. Thus, developing automated misinformation detection methods has become a field of high interests both in academia and in industry. In many developing countries such as Brazil, India, and Mexico, one of the primary sources of misinformation is the messaging application WhatsApp. Despite this scenario, due to the private messaging nature of WhatsApp, there still few methods of misinformation detection developed specifically for this platform. In this work we present the FakeWhatsApp.BR, a dataset of WhatsApp messages in Brazilian Portuguese, collected from Brazilian public groups and manually labeled. Besides, we evaluated a series of misinformation classifiers combining Natural Language Processing-based techniques of feature extraction and a set of well-know machine lear
ning algorithms, totaling 108 different scenarios. Our best result achieved a F1 score of 0.73, and the analysis of errors indicates that they occur mainly due to the predominance of short texts that accompany media files. When texts with less than 50 words are filtered, the F1 score rises to 0.87.
(More)