labeling approach used. We present and compare the
results of applying different machine learning mod-
els to that data, further comparing two different sets
of features. We demonstrate the satisfactory perfor-
mance of many models, notably based on Random
Forest and AdaBoost, on different test sets, generated
with different approaches. This shows the model is
capable of generalizing to other contexts of identify-
ing fake news. In the mainly keyword-based S1 data
set, our models achieved a best F-measure of 0.74. In
other test sets, results were as high as 0.94.
We investigated and showed the contribution of
employing features derived from named entities and
emotion recognition in enhancing the automatic iden-
tification of fake news. In the three algorithms which
consistently provided the best overall results, these
features helped improve the F-measure in three of the
four test sets used. We believe such a model is an im-
portant tool with several possibles uses, from alerting
end users about potentially unreliable content to as-
sisting organizations in automatically filtering ques-
tionable content for screening, and can contribute to
the mitigation of this problem that affects us all.
ACKNOWLEDGEMENTS
This work is financed by National Funds through
the Portuguese funding agency, FCT – Fundac¸
˜
ao
para a Ci
ˆ
encia e a Tecnologia, within project
UIDB/50014/2020.
REFERENCES
Bovet, A. and Makse, H. A. (2019). Influence of fake news
in twitter during the 2016 us presidential election. Na-
ture communications, 10(1):1–14.
Figueira,
´
A. and Oliveira, L. (2017). The current state of
fake news: challenges and opportunities. Procedia
Computer Science, 121:817–825.
Grinberg, N., Joseph, K., Friedland, L., Swire-Thompson,
B., and Lazer, D. (2019). Fake news on twitter
during the 2016 us presidential election. Science,
363(6425):374–378.
Guess, A., Nagler, J., and Tucker, J. (2019). Less than you
think: Prevalence and predictors of fake news dissemi-
nation on facebook. Science advances, 5(1):eaau4586.
Guimar
˜
aes, N., Figueira,
´
A., and Torgo, L. (2021a). An or-
ganized review of key factors for fake news detection.
arXiv preprint arXiv:2102.13433.
Guimar
˜
aes, N., Figueira,
´
A., and Torgo, L. (2021b). To-
wards a pragmatic detection of unreliable accounts on
social networks. Online Social Networks and Media,
24:100152.
Honnibal, M., Montani, I., Van Landeghem, S., and Boyd,
A. (2020). spaCy: Industrial-strength Natural Lan-
guage Processing in Python.
Hutto, C. and Gilbert, E. (2014). Vader: A parsimonious
rule-based model for sentiment analysis of social me-
dia text. In Proceedings of the International AAAI
Conference on Web and Social Media, volume 8.
Ito, J., Song, J., Toda, H., Koike, Y., and Oyama, S. (2015).
Assessment of tweet credibility with lda features. In
Proceedings of the 24th International Conference on
World Wide Web, pages 953–958.
Kaufman, S., Rosset, S., Perlich, C., and Stitelman, O.
(2012). Leakage in data mining: Formulation, detec-
tion, and avoidance. ACM Transactions on Knowledge
Discovery from Data (TKDD), 6(4):1–21.
Lema
ˆ
ıtre, G., Nogueira, F., and Aridas, C. K. (2017).
Imbalanced-learn: A python toolbox to tackle the
curse of imbalanced datasets in machine learning. The
Journal of Machine Learning Research, 18(1):559–
563.
Loria, S. (2018). textblob documentation. Release 0.15, 2.
Mitra, T. and Gilbert, E. (2015). Credbank: A large-scale
social media corpus with associated credibility an-
notations. In Proceedings of the International AAAI
Conference on Web and Social Media, volume 9.
Mohammad, S. M. and Turney, P. D. (2013). Crowdsourc-
ing a word–emotion association lexicon. Computa-
tional intelligence, 29(3):436–465.
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., and Manning, C. D.
(2020). Stanza: A Python natural language processing
toolkit for many human languages. In Proceedings of
the 58th Annual Meeting of the Association for Com-
putational Linguistics: System Demonstrations.
Sample, C., Justice, C., and Darraj, E. (2019). A model
for evaluating fake news. The Cyber Defense Review,
pages 171–192.
Shannon, C. E. (1948). A mathematical theory of communi-
cation. The Bell system technical journal, 27(3):379–
423.
Shu, K., Mahudeswaran, D., Wang, S., Lee, D., and Liu,
H. (2020). Fakenewsnet: A data repository with news
content, social context, and spatiotemporal informa-
tion for studying fake news on social media. Big Data,
8(3):171–188.
Weir, W. (2009). History’s Greatest Lies: The Startling
Truths Behind World Events Our History Books Got
Wrong. Fair Winds Press.
Weischedel, R., Palmer, M., Marcus, M., Hovy, E., Prad-
han, S., Ramshaw, L., Xue, N., Taylor, A., Kaufman,
J., Franchini, M., et al. (2013). Ontonotes release 5.0
ldc2013t19. Linguistic Data Consortium, Philadel-
phia, PA, 23.
Zandt, D. V. (2021). Media Bias/Fact Check
Methodology. Accessed June, 2021 from
https://mediabiasfactcheck.com/methodology/.
Zubiaga, A., Liakata, M., Procter, R., Wong Sak Hoi, G.,
and Tolmie, P. (2016). Analysing how people orient
to and spread rumours in social media by looking at
conversational threads. PloS one, 11(3):e0150989.
A Mixed Model for Identifying Fake News in Tweets from the 2020 U.S. Presidential Election
315