2 FUNDAMENTAL CONCEPTS
In this work, we propose an embracing definition for
the concept of “misinformation spreader”, which is
formulated next:
Theorem 2.1. Let be a user u = {U
u
, E
u
} of a social
network N, who has associated with him a set U
u
=
{u
1
, u
2
, ..., u
m
} of other m users with which u has a
connection, a set of engagements E
u
= {e
u
1
, e
u
2
, ..., e
u
n
},
where each e
u
i
= {p
i
, a,t} represents an engagement
of u with the publication p
i
, for the action a, in the
time t. Let the function Q(s) 7→ [0, 1] be a misinfor-
mation score assigned to u and τ be a decision thresh-
old. Detecting misinformation spreaders is the task of
learning a prediction function G(u, τ) 7→ [0, 1], satis-
fying:
G(u, τ) =
(
1 (is a misinf. spreader), if Q(s) ≥ τ
0 (is not a misinf. spreader), if Q(s) < τ
A specific definition to categorize a user as a mis-
information spreader may vary according to the an-
alyzed social network or the particular behavior one
wants to detect. However, it should be considered
that the user posts or shares misinformation with un-
usual frequency or proportion compared to other users
of this social network. That is, a misinformation
spreader publishes a high amount of misleading pub-
lications, or most of his publications contain false in-
formation. It is not, therefore, a gullible user who has
regular activities on the social network and eventually
publishes unreliable information, but users engaged in
abnormally disseminating misinformation compared
to regular users. It is essential to highlight that, de-
pending on the social network, this behavior often vi-
olates its community policies.
3 RELATED WORK
The misinformation spreaders detection is still a prob-
lem little addressed in the context of the Portuguese
language. Most of the existing works address the bots
detection problem. In (Leite et al., 2020), a set of
rules was proposed to describe and classify bots on
Twitter. The rules are based on the users behavior,
and use as input data the number of tweets book-
marked, the index of answered tweets, and the aver-
age of retweets. Using a decision tree, users can be
classified by these rules. The best result achieved an
AUC of 0.97 using the dataset collected by (Cresci
et al., 2017).
In (Benevenuto et al., 2008), the authors in-
vestigate the problem of detecting malicious users
(spammers) on the YouTube platform. Users are
represented by three groups of features: user fea-
tures, video features and social network features.
User features include the number of videos added
to YouTube, number of friends, number of videos
watched, number of videos added as favorites, num-
ber of response videos sent and received, number of
subscriptions, number of subscribers, and the maxi-
mum number of videos added in a day. Video fea-
tures include the videos length average, number of
views, ratings, comments, favorites, honorable men-
tions, and external links on posted videos. Social
network features include clustering coefficient, user
rank, betweenness, reciprocity and assortativity. Us-
ing these features, an F1 Score of 0.81 was obtained
in the malicious user detection task.
The effectiveness of the most popular classifiers,
such as Random Forest and AdaBoost, in detect-
ing bots was evaluated in (Morais and Digiampietri,
2022). The obtained results pointed to the degrada-
tion of the efficiency of the classifiers when exposed
to new datasets, different from the dataset used during
the model training. This result derives, among other
factors, from the dependence on information based
on the user’s profile, which are frequently changed
by bots developers whenever they realize that certain
features are being used by the detection algorithms.
In (Shahid et al., 2022), the authors provided a
comprehensive survey of the state of art methods for
detecting malicious users and bots based on different
features. In (Rath et al., 2021), the authors presented
SCARLET (truSt andCredibility bAsed gRaph neu-
raLnEtwork model using aTtention), a model to pre-
dict misinformation spreaders on Twitter. Using real
world Twitter datasets, they show that SCARLET is
able to predict false information spreaders with an ac-
curacy of over 87%. In (Rath and Srivastava, 2022),
the authors proposed a framework based on a com-
plementary approach to false information mitigation
on Twitter inspired from the domain of Epidemiol-
ogy, where false information is analogous to infec-
tion, social network is analogous to population and
likelihood of people believing an information is anal-
ogous to their vulnerability to infection.
In (Heidari et al., 2021), the authors analyzed sen-
timent features and their effect on the accuracy of ma-
chine learning models for social media bot detection
on Twitter. A new set of sentiment features were ex-
tracted from tweet’s text and used to train bot detec-
tion models. Besides, they proposed a new model for
the Dutch language and achieve more than 87% ac-
curacy for the Dutch tweets based on new sentiment
features.
ICEIS 2023 - 25th International Conference on Enterprise Information Systems
220