Bilingual Emotion Analysis on Social Media throughout the COVID19
Pandemic in Portugal
Alina Trifan
1
, S
´
ergio Matos
1
, Pedro Morgado
2
and Jos
´
e Lu
´
ıs Oliveira
1
1
DETI/IEETA, University of Aveiro, Portugal
2
School of Medicine, University of Minho, Portugal
Keywords:
Sentiment Analysis, Social Media, COVID-19.
Abstract:
This paper presents preliminary work on the topic of emotion analysis on Twitter, in the context of the coro-
navirus pandemic in Portugal. We collected, curated and analyzed covid-related tweets of users in Portugal
in order to understand the evolution of the six basic emotions reflected in these tweets. We analyzed tweets
written in both English and Portuguese. In this first step of our work we correlate this information with key
events of the evolution of the pandemic in Portugal during March, which was the most critical period in Por-
tugal. We do so in an attempt to estimate the online manifestation of the psychological toll that this pandemic
has on the overall well-being status of the general population. Our findings show that the sentiment analysis
of covid-related tweets is consistent with our hypothesis that negative emotions would intensify as the pan-
demic progressed. The preliminary results obtained stand as proof of concept that the analysis of real-time
tweets or other social media messages through sentiment analysis can be an important tool for behavioural and
well-being tracking.
1 INTRODUCTION
The focus of attention of health care providers around
the world for the last several months has been the
problem of the new coronavirus (COVID-19) and its
spread. In addition to efforts at various levels to pre-
vent the spread of the disease and other worrisome
conditions, special attention should be paid to men-
tal health and care. According to similar epidemics
and pandemics, in such cases, serious concerns such
as fear of death can arise among patients, and feelings
of loneliness and insecurity can develop. Moreover,
people who are quarantined lose face-to-face connec-
tions and traditional social interventions, which sig-
nificantly lowers personal and mental well-being.
We present in this short paper preliminary work
on the analysis of social media posts for inferring
the prevalence and evolution of the six basic human
emotions in Portugal, during the first four and a half
months of the pandemic. Our hypothesis is that neg-
ative emotions, and consequently a decrease in per-
sonal well-being and possibly mental health, would
be predominant and that their intensity would fol-
low the evolution of the COVID-19 cases in Portugal.
We are interested in understanding this evolution and
performing sentiment analysis as an initial step of a
broader goal of monitoring mental health and well-
being among social networks users in these challeng-
ing times. This paper is structured in five more sec-
tions. We shortly discuss next the current background
in sentiment analysis, mental health and well-being
monitoring in the context of COVID-19 pandemic.
In Section 3 we detail the process of data collection
and curation. We present our analysis methodology
in Section 4 and the results obtained so far in Sec-
tion 5. Finally, we conclude the paper and discuss
future research steps in Section 6.
2 BACKGROUND
In the current context, global attention has largely
been focused on the infected patients and the front-
line responders, with some marginalised populations
in society having been overlooked (Ho et al., 2020).
Previous research has revealed a profound and broad
spectrum of psychological impact that outbreaks can
inflict on people (Hall et al., 2008; M
¨
uller, 2014).
Several recent publications in the area of mental
health try to raise alarm flags with respect to the im-
portance of finding new methods for responding to
430
Trifan, A., Matos, S., Morgado, P. and Oliveira, J.
Bilingual Emotion Analysis on Social Media throughout the COVID19 Pandemic in Portugal.
DOI: 10.5220/0010244204300434
In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF, pages 430-434
ISBN: 978-989-758-490-9
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
mental health needs during the pandemic time. Xiang
et al. (Xiang et al., 2020) claim that the mental health
needs of patients with confirmed COVID-19, patients
with suspected infection, quarantined family mem-
bers, and medical personnel have been poorly han-
dled throughout this time. Similarly, Yao et al. (Yao
et al., 2020) express their concerns with regards to
the effect of the pandemic on people with mental
health disorders. The obvious solution for Wind et
al. (Wind et al., 2020) to continue mental health care
within a pandemic is to provide mental health care at
a warm distance by video-conferencing psychother-
apy and internet interventions. This idea is further
supported by a study among Chinese citizens (Gao
et al., 2020), in which Gao et al. assess the prevalence
of mental health problems and examine their associa-
tion with social media exposure. Their findings high-
light that there is a link between the two, which sug-
gests that the government need to pay more attention
to mental health problems and their online manifesta-
tion. A portuguese study also found that individuals
previously receiving psychotherapeutic support ben-
efit if they did not interrupt the process as a conse-
quence of the outbreak (Moreira et al., 2020).
The already available scientific evidence previ-
ously introduced strongly suggests that a shift in men-
tal health care provision towards online prevention
and treatment in the near future is needed. As such,
the widespread use of social media combined with
the rapid development of computational infrastruc-
tures to support big data, and the maturation of natural
language processing and machine learning technolo-
gies offer exciting possibilities for the improvement
of both population-level and individual-level mental
health (Conway and O’Connor, 2016).
3 DATA COLLECTION
This work builds on prior research contributions, both
national and international, that enabled the collection
of both social media and clinical statistics data. For
social media data, we collected four and a half months
of Twitter
1
posts whose content contained coron-
avirus related vocabulary, as explained next. The data
collection process followed the pipeline recently pub-
lished by Chen et. al (Chen et al., 2020). According to
Twitter’s Terms and Conditions, Chen et. al have re-
leased Tweet IDs, which are unique identifiers tied to
specific tweets containing coronavirus related terms.
The collection of ids is available on GitHub
2
.
1
www.twitter.com
2
https://github.com/echen102/COVID-19-TweetIDs
Based on this dataset, we queried the Twitter API
and obtained the complete dataset. For each tweet
in the collection we retrieved tweet content (text,
URLs, hashtags) and authors’ metadata. Follow-
ing authors’ suggestions, we used Hydrator
3
, Twit-
ter’s search API
4
and Twarc
5
for retrieving the data.
All tweets included in this collection contain coron-
avirus related vocabulary, both as hashtags or men-
tions/keywords in the text of the tweet. We collected
tweets posted from 21st of January to 31st of May,
2020. More details on the collection process can be
found in the original publication (Chen et al., 2020).
We filtered the collected tweets based on the
tweet’s geo-location or user’s location. When users
post tweets from a GPS-enabled device or they tag
a location in their tweet, this information can be re-
trieved as longitude and latitudine coordinates. Un-
fortunately, only roughly 1–3% of Twitter messages
are geocoded (Paul and Dredze, 2017). Because the
number of geo-located tweets in the corpora we col-
lected is indeed limited, we used the location de-
fined on the user’s profile for extracting tweets of Por-
tuguese users. Using Tweepy
6
we filter for user loca-
tions that match a dictionary of city names in Por-
tugal. We filter out false positives by removing lo-
cations that also match names of cities or states in
other countries, namely Brazil (e.g. a match for Porto
Alegre, a city in Brazil, replaces a match for the Por-
tuguese city Porto). We ended up with a list of 76898
distinct users and 117772 distinct tweets.
For clinical statistics and the identification of key
events of the manifestation of the COVID-19 pan-
demic in Portugal we relied on an open-source re-
spository maintained by Data Science for Social Good
Portugal
7
, an open community of data scientists that
tackle relevant and current societal issues. This repos-
itory contains daily updates of the statistic informa-
tion released by the Portuguese Ministery of Health
regarding COVID-19 cases in Portugal.
4 DATA ANALYSIS
Psychologist Paul Eckman identified six basic emo-
tions that he suggested were universally experienced
in all human cultures. These emotions were happiness
(joy), sadness, disgust, fear, surprise, and anger (Ek-
man, 1992). Happiness is often classified as a posi-
3
https://github.com/DocNow/hydrator
4
https://developer.twitter.com/en/docs/tweets/search/api-
reference/get-search-tweets
5
https://github.com/DocNow/twarc
6
https://www.tweepy.org/
7
https://github.com/dssg-pt/covid19pt-data
Bilingual Emotion Analysis on Social Media throughout the COVID19 Pandemic in Portugal
431
tive emotion, while the remaining five basic emotions
are classified as negative. We studied the basic emo-
tions reflected in Portuguese user tweets at the word
level, using Natural Language Processing (NLP), both
in English and in Portuguese. Words with basic dis-
crete emotions were tabulated, counted and their fre-
quency was measured.
We first filtered tweets by language and addressed
only tweets written in English or Portuguese. In the
preprocessing step we performed tokenization, stem-
ming and lemmatization, lowercase conversion and
stopwords removal for each of the two languages.
To this purpose we chose Python
8
as a programming
language and the Natural Language Toolkit
9
as NLP
framework. For the tweets written in English, we
used Empath (Fast et al., 2016), a lexicon mined from
modern text on the web, using a combination of deep
learning and crowdsourcing, to retrieve the counts and
frequencies of the six basic emotions. Each of these
emotions corresponds to an Empath lexical category,
which is formed by the emotion word and a collection
of other similar words that convey the same emotion.
For the tweets written in Portuguese, a psychiatrist
member of our research team defined a semantic clus-
ter of Portuguese words that are associated to each of
the six basic emotions. These semantic clusters are
presented in Table 1. For both languages, we used the
lexicon words’ frequency in a tweet to measure the
strength of a specific emotion. For example, if two
words in a tweet were coded as joy or two occurences
of terms belonging to the joy semantic cluster, and
one word was coded as fear, joy was counted twice
and fear was counted once. These counts were then
normalized with respect to the total number of tokens
in a tweet.
5 RESULTS
We present preliminary results on the sentiment anal-
ysis of the evolution of the six basic emotion of Por-
tuguese Twitter users, along with statistical informa-
tion regarding the corpus that we curated. We con-
sider this curation of tweets written by Portuguese
users an important scientific contribution by itself.
We will publish the list of tweets ids, annotated with
the language in which the tweet was written, as open
source upon paper acceptance. We want, on one hand,
to encourage further exploration of this data for gain-
ing general well-being insights in Portugal and on the
other hand, to serve as possible comparation base for
8
www.python.org
9
https://www.nltk.org/
Table 1: Portuguese language semantic clusters defined for
each of the six basic emotions.
Emotion Semantic cluster
Joy (alegria) feliz; felicidade; es-
petacular; esperanc¸a;
expetativa; fant
´
astico;
wow; alegria
Sadness (tristeza) triste; deprimido; de-
pressivo; tristeza
Disgust (nojo) nojo; contaminac¸
˜
ao; re-
pulsa; cont
´
agio
Fear (medo) medo; ansioso; preocu-
pado; apreensivo; ner-
voso;
Surprise (surpresa) surpreendido; surpresa;
inesperado
Anger (raiva) wtf; merda; pqp; fdx;
revoltado; zangado; irri-
tado; enervado
Table 2: Number of tweets per month, global numbers and
tweet counts discriminated by language - Portuguese (PT),
English (EN) and other languages. It is important to note
that in January we only retrieved posts from the 21st of Jan-
uary to 31st of January.
Month # PT # EN # Other
January 2679 2262 79
February 17277 3312 1175
March 29158 11663 1745
April 10022 6401 1006
May 12407 5733 941
similar studies going on in other countries.
Table 2 overviews the number of tweets posted by
Portuguese users, by month, from 21st of January to
31st of May. For each month, we indicate the num-
ber of tweets written in Portuguese and in English,
as well as the total number of tweets written in other
languages. Among the tweets written in other lan-
guages than Portuguese and English, Spanish, Franch
and Italian were the most predominant ones.
Just by looking at the total number of tweets per
month we can see that March was the most critical
month of the pandemic in Portugal. The first COVID-
19 positive cases were confirmed by the end of Febru-
ary but in a small number. Partial lockdown started in
Portugal by mid-March and it slowly progressed into
total lockdown by the end of the month. The num-
ber of tweets related to COVID-19 published in this
period speaks for the increased interest and concern
that this pandemic had during the month of March.
Another important remark is that the ratio between
the number of tweets written in Portuguese and the
number of tweets written in English during February
HEALTHINF 2021 - 14th International Conference on Health Informatics
432
and March is superior to the ones in April and May,
which might suggest an increased focus on the de-
velopment of the national COVID-19 situation, rather
than worldwide. The main hypothesis of the present
study is that the tweets collected would reflect more
negative emotions throughout the pandemic, up to its
peak, which in Portugal it is estimated to have hap-
pened around 23rd-25th of March 2020. The evolu-
tion of COVID-19 cases in Portugal, both in terms of
infected population and number of suspected cases is
presented in Fig. 1.
Figure 1: COVID-19 cases in Portugal throughout the
month of March.
We examined the patterns of the tweets and the
evolution of the basic emotions to see whether they
were consistent with our expectations. We performed
this analysis for all the tweets collected so far. How-
ever, due to the late-breaking nature of this research,
we will focus here on the month of March, the most
critical one so far in Portugal.
We present the number of tweets per day written
by Portuguese users during the month of March in
Fig. 2. Figure 3 shows the normalized evolution of the
six basic emotions in covid-related tweets written in
English by Portuguese users over the month of March
2020. As we can see, the negative emotions are pre-
dominant with little joy or positive sentiment present
in these tweets. Moreover, it is important to note the
spike in fear around the period when home isolation
started. On the 7th of March the first COVID-19 cases
were found among student population and in the fol-
lowing day 2 universities were closed.
6 CONCLUSIONS AND FUTURE
DIRECTIONS
Social media reflects the lives of a population and
their attitudes and as such, social media sentimen-
tal analysis and behavioural tracking has the poten-
tial of becoming an important tool into turning pub-
lic health provision more personalized and efficient.
Figure 2: Number of tweets per day during March. Tweets
written in English in blue and tweets written in Portuguese
in orange. There are two clear spikes in the number of Por-
tuguese tweets in the beginning of the month, with the first
COVID-19 suspected cases and a second spike around the
period when home isolation was decreted in Portugal.
Figure 3: Evolution of the basic emotion frequencies in
tweets written in English and Portuguese during March. We
also include the evolution as categories of negative emotion
and positive emotion.
We presented in this paper the evolution of the six ba-
sic emotions over Twitter, during the COVID-19 pan-
demic in Portugal, which can be correlated with the
disease evolution. We found the emotional patterns
of the tweets largely consistent with our expectations,
with more negative tweets posted in the beginning of
the pandemic, up to its peak.
As future work, we are interested in analyzing this
dataset into more detail in order to leverage mental
health and well-being status knowledge of the social
media users. As such, we will explore psycholinguis-
tic features and previously trained models on social
media corpora of mental-health related issues in or-
der to better understand the impact of this pandemic.
ACKNOWLEDGEMENTS
This work was supported by the Integrated Pro-
gramme of SR&TD SOCA (Ref. CENTRO-01-0145-
Bilingual Emotion Analysis on Social Media throughout the COVID19 Pandemic in Portugal
433
FEDER-000010), co-funded by Centro 2020 pro-
gram, Portugal 2020, European Union, through the
European Regional Development Fund and by the
EU/EFPIA Innovative Medicines Initiative 2 Joint
Undertaking under grant agreement No 806968.
REFERENCES
Chen, E., Lerman, K., and Ferrara, E. (2020). Tracking
social media discourse about the covid-19 pandemic:
Development of a public coronavirus twitter data set.
JMIR Public Health and Surveillance, 6(2):e19273.
Conway, M. and O’Connor, D. (2016). Social media, big
data, and mental health: current advances and ethical
implications. Current opinion in psychology, 9:77–82.
Ekman, P. (1992). An argument for basic emotions. Cogni-
tion & emotion, 6(3-4):169–200.
Fast, E., Chen, B., and Bernstein, M. S. (2016). Empath:
Understanding topic signals in large-scale text. In Pro-
ceedings of the 2016 CHI Conference on Human Fac-
tors in Computing Systems, pages 4647–4657. ACM.
Gao, J., Zheng, P., Jia, Y., Chen, H., Mao, Y., Chen, S.,
Wang, Y., Fu, H., and Dai, J. (2020). Mental health
problems and social media exposure during covid-19
outbreak. Plos one, 15(4):e0231924.
Hall, R. C., Hall, R. C., and Chapman, M. J. (2008). The
1995 kikwit ebola outbreak: lessons hospitals and
physicians can apply to future viral epidemics. Gen-
eral hospital psychiatry, 30(5):446–452.
Ho, C. S., Chee, C. Y., and Ho, R. C. (2020). Mental
health strategies to combat the psychological impact
of covid-19 beyond paranoia and panic. Ann Acad
Med Singapore, 49(1):1–3.
Moreira, P. S., Ferreira, S., Couto, B., Machado-Sousa, M.,
Fernandez, M., Raposo-Lima, C., Sousa, N., Pico-
Perez, M., and Morgado, P. (2020). Protective ele-
ments of mental health status during the covid-19 out-
break in the portuguese population. medRxiv.
M
¨
uller, N. (2014). Infectious diseases and mental
health. Comorbidity of Mental and Physical Disor-
ders, page 99.
Paul, M. J. and Dredze, M. (2017). Social monitoring for
public health. Synthesis Lectures on Information Con-
cepts, Retrieval, and Services, 9(5):1–183.
Wind, T. R., Rijkeboer, M., Andersson, G., and Riper, H.
(2020). The covid-19 pandemic: The ‘black swan’for
mental health care and a turning point for e-health.
Internet Interventions.
Xiang, Y.-T., Yang, Y., Li, W., Zhang, L., Zhang, Q., Che-
ung, T., and Ng, C. H. (2020). Timely mental health
care for the 2019 novel coronavirus outbreak is ur-
gently needed. The Lancet Psychiatry, 7(3):228–229.
Yao, H., Chen, J.-H., and Xu, Y.-F. (2020). Patients with
mental health disorders in the covid-19 epidemic. The
Lancet Psychiatry, 7(4):e21.
HEALTHINF 2021 - 14th International Conference on Health Informatics
434