Analyzing Tweets Using Topic Modeling and ChatGPT: What We Can

Learn About Teachers and Topics During COVID-19 Pandemic-Related

School Closures

Anna C. Weigand

1,2 a

, Maj F. Jacob

, Maria Rauschenberger

2 b

and

Maria Jos

e Escalona Cuaresma

1 c

Department of Computer Languages and Systems, University of Seville, Seville, Spain

Faculty of Technology, University of Applied Sciences Emden/Leer, Emden, Germany

Keywords:

Machine Learning, Data Set, Topic Modeling, Twitter, X, Twitterlehrerzimmer, Twlz, Covid, ChatGPT.

Abstract:

This study examines the shifting discussions of teachers within the #twlz community on Twitter across three

phases of the COVID-19 pandemic – before school closures and during the ﬁrst and second school closures.

We analyzed tweets from January 2020 to May 2021 to identify topics related to education, digital transforma-

tion, and the challenges of remote teaching. Using machine learning and ChatGPT, we categorized discussions

that transitioned from general educational content to focused dialogues on online education tools during school

closures. Before the pandemic, discussions were generally focused on education and digital transformation.

During the ﬁrst school closures, conversations shifted to more speciﬁc topics, such as online education and

tools to adapt to distance learning. Discussions during the second school closures reﬂected more precise needs

related to ﬂuctuating pandemic conditions and schooling requirements. Our ﬁndings reveal a consistent in-

crease in the speciﬁcity and urgency of the topics over time, particularly regarding digital education.

1 INTRODUCTION

Among teachers in Germany, the #twitterlehrerz-

immer or #twlz community on Twitter is an estab-

lished forum for digital exchange (F

utterer et al.,

2021). In the following, we refer to it as the

#twlz community. The social media platform Twitter

(https://twitter.com) is a microblogging service that,

since 2006, has allowed users to write short posts

(called tweets) of up to 280 characters. Although it

was renamed X in July 2023 (Britannica, The Editors

of Encyclopaedia, 2024), we refer to it as Twitter in

this study because our data set was collected before

this change.

During the COVID-19 pandemic, participation

in the #twlz community grew, especially during the

school closures (F

utterer et al., 2021). The pan-

demic changed the situation in the schools immedi-

ately, forcing schools to take measures such as com-

pletely closing down for several weeks (Huber, 2021)

or organizing rotating classes (Grill, M., Mascolo,

https://orcid.org/0000-0003-2674-0640

https://orcid.org/0000-0001-5722-576X

https://orcid.org/0000-0002-6435-1497

G., Munzinger, P., Zick, T., 2022). Rotating classes

meant that some of the students were homeschooled

(i.e., distance education by their school teachers)

while the others were physically in the classroom to

ensure small groups of students and reduce the risk of

COVID-19 infection. The School Barometer, which

monitors the situation at schools in Germany, Aus-

tria, and Switzerland from different perspectives, de-

picts high stress, struggles, and challenges for teach-

ers during this time (Huber, 2021). Consequently,

there was an increased demand among teachers for

knowledge exchange. We hypothesize that the impor-

tance of speciﬁc topics varied according to the timing

of their tweet publication – either before or during the

COVID-19 pandemic.

In this paper, we explore two primary areas with a

mixed-methods approach: (1) the evolution of discus-

sion topics in the #twlz community before and dur-

ing the ﬁrst nationwide and second school closures

in Germany due to the COVID-19 pandemic and (2)

the different methodologies (i.e., topic modeling and

ChatGPT) employed to analyze the Twitter data sets.

350

Weigand, A. C., Jacob, M. F., Rauschenberger, M. and Escalona Cuaresma, M. J.

Analyzing Tweets Using Topic Modeling and ChatGPT: What We Can Learn About Teachers and Topics During COVID-19 Pandemic-Related School Closures.

DOI: 10.5220/0013036900003825

In Proceedings of the 20th International Conference on Web Information Systems and Technologies (WEBIST 2024), pages 350-357

ISBN: 978-989-758-718-4; ISSN: 2184-3252

2 RELATED WORK

Tweets related to the German school closures dur-

ing the COVID-19 pandemic and before have al-

ready been analyzed using a mixed-methods ap-

proach (F

utterer et al., 2021). The study combined

quantitative and qualitative research approaches to ex-

amine discussions within the #twlz community during

the ﬁrst nationwide school closures. They explored

the differences between before and during the ﬁrst

school closures and the opportunities and challenges

discussed during the ﬁrst school closures.

The authors applied the tf-idf-analysis (term

frequency-inverse document frequency) to sta-

tistically identify the importance of strings to a

document (Arya, 2022) on two subsets of data:

before the nationwide school closures (January 6 to

February 17, 2020) and during the ﬁrst nationwide

school closures (March 16 to April 27, 2020). This

analysis identiﬁed keywords by their frequency of

appearance and bigrams that were especially relevant

within the tweets. The three most distinctive words

were selected to calculate correlations with all other

words used in the tweets in order to evaluate the

signiﬁcance of the content based on these bigram net-

works. In addition, to address the research question

of opportunities and challenges, a manual, resource-

intensive content analysis was conducted on just 270

tweets. These were selected from an initial data set of

128,422 tweets with high interaction metrics such as

retweets, likes, and comments, following Mayring’s

methodology (Mayring, 2015). The bigram networks

revealed that the topic digital education had already

been discussed before the Germany-wide school

closures but that the exchange increased during the

ﬁrst school closures. They found that, before the pan-

demic, users discussed more general topics, such as

education and learning, classes and school life, and

educational revolution and crisis. While the schools

were closed, topics such as mutual help and speciﬁc

software and tools for teaching and learning became

popular (F

utterer et al., 2021). In addition, distance

learning, live streaming, ﬂipped learning, and

homeschooling were often discussed within the #twlz

community during the ﬁrst school closures in Ger-

many. According to their manual content analysis,

the biggest challenges during the nationwide school

closures in Germany were good digital classes,

missing software, and the lack of digital know-how

for digital teaching. Opportunities included the

opportunity for networking and exchange within

Twitter’s teacher community as well as the offering

of digital material and explanations and tricks.

Other studies (Xue et al., 2020a; Xue et al., 2020b)

used a machine learning (ML) approach to analyze

COVID-19-related tweets but not speciﬁcally #twit-

terlehrerzimmer or #twlz. Latent Dirichlet Allocation

(LDA) was applied to ﬁnd the discussed topics. These

topics were the basis for the authors’ manual content

analysis (Braun and Clarke, 2006) to identify themes,

such as public health measures to slow the spread

of COVID-19, social stigma associated with COVID-

19, COVID-19 new cases and deaths, COVID-19 in

the United States, and COVID-19 cases in the rest of

the world. They also conducted a sentiment analy-

sis, which is a natural language processing method,

by applying the NRC Emotion Lexicon (Mohammad

and Turney, 2013).

In summary, existing research has already exam-

ined Twitter data through ML approaches such as

LDA models. The Twitter data from the #twlz teacher

community concerning COVID-19-related topics was

analyzed using the statistical approach of tf-idf-

analysis together with manual content analysis. How-

ever, there has been no analysis using ML to high-

light the main topic changes within the #twlz teacher

community over the course of the pandemic. Specif-

ically, there has been no comparison of the content

from before the pandemic to the content during the

ﬁrst and second school closures. A comparison of

results from different studies using various methods

could also provide new methodological insights.

3 METHODOLOGY

In this section, we ﬁrst describe our mixed-methods

research approach. We then explain our process of

data collection, data preparation, and modeling.

3.1 Research Approach

As an overall research approach, we apply the Design

Science Research Methodology (Peffers et al., 2007).

Therefore, we go through the following steps: We

identify the problem through a literature review, deﬁne

objectives for problem-solving by our study design,

design and develop solutions for the problem by train-

ing an LDA model with our Twitter data set, demon-

strate the solution for the problem by inferring themes

personally and with the help of ChatGPT-3.5 (Ope-

nAI, 2022), evaluate the solution for the problem by

comparing our results with the ﬁndings of F

utterer et

al. (2021), and communicate the problem and its so-

lution with this work.

Therefore, from our literature review and the re-

lated work (F

utterer et al., 2021; Xue et al., 2020a;

Xue et al., 2020b), we derive the following research

Analyzing Tweets Using Topic Modeling and ChatGPT: What We Can Learn About Teachers and Topics During COVID-19

Pandemic-Related School Closures

351

Figure 1: We cut out three subsets of data (P1, P2, and P3) from the raw data set.

questions (RQ), which include both content (RQ1 -

RQ3) and technical (RQ4) perspectives:

RQ1. What topics were frequently being discussed

using the hashtag #twitterlehrerzimmer and

#twlz in January and February 2020, before the

pandemic?

RQ2. What topics were frequently being discussed

during the ﬁrst nationwide school closures in

March and April 2020, and are there obvious

changes from the period of January and Febru-

ary 2020?

RQ3. Are there any differences between the top-

ics that were being discussed during the ﬁrst

nationwide school closures (March and April

2020) and the second school closures (April

and May 2021)?

RQ4. Comparing our methods with those of the ex-

isting study (F

utterer et al., 2021), are there

any substantive differences in terms of the re-

sults?

Since there is no publicly available data set to an-

swer our research questions, we collected and curated

our own data set from Twitter.

3.2 Data Collection

Using the Twitter API (Twitter, 2023) and the twarc2

Python library (twarc, 2024), we generate a raw data

set of 152,865 tweets with the hashtags #twitter-

lehrerzimmer and/or #twlz. These tweets will later

be made anonymous for ethical reasons (Webb et al.,

2017). We did not include retweets. To answer our

research questions, we chose the time period from

January 6, 2020 (two months before the ﬁrst school

closures) to May 23, 2021 (second school closures).

This resulted in three subsets of data (P1, P2, P3),

each covering a period of 42 days (see also Figure 1),

as in the study by F

utterer et al. (2021):

P1. from January 6 to February 17, 2020, when no

COVID-19 measures were in effect in Germany,

as the ﬁrst measures were taken in schools by the

government on March 16, 2020 (5,229 tweets)

P2. from March 16 to April 27, 2020, when all

schools in Germany closed (11,137 tweets)

P3. from April 12 to May 23, 2021, when the

schools closed again or organized rotating

classes (14,485 tweets)

3.3 Data Preparation

In line with previous studies (F

utterer et al., 2021;

Xue et al., 2020a; Xue et al., 2020b), we clean the

subsets of data P1, P2, and P3 to prepare them for

further processing by removing:

• all space characters exceeding the character

length of one, all single characters, and all URLs

(to remove irrelevant information)

• all numbers, punctuation, and special characters

(to remove irrelevant information such as emojis)

• all @-mentions of persons (to remove irrelevant

information and make tweets anonymous)

• #-characters (to consider the word of the used

hashtag as a topic as well)

• all * and : (to convert inclusive German language

forms into standard feminine forms without spe-

cial characters)

• stop words according to the NLTK Python li-

brary (Aarsen et al., 2023), plus the following

additional words: twitterlehrerzimmer, twlz,

gehen, ja, nein, ab, f

ur, hallo, and liebes (to

remove words with less informative value, such

as articles, pronouns, and prepositions)

Then, to split the tweets (strings) into single words

(sub-strings), we apply tokenize from the NLTK

Python library (Aarsen et al., 2023) to each subset

of data, or corpus (P1, P2 and P3). From these

WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies

352

Table 1: An example of our data preparation process for a tweet, from the original to the bigrammed to the lemmatized tweet.

Original tweet https://t.co/VcdT4Za2WD Auch f

ur den #Impfstoff von #Moderna soll die Zu-

lassung f

ur #Kinder ab 12 Jahre bei der #EMA beantragt werden. Eine gute

Nachricht, je mehr zugelassen ist, umso schneller sollten die #Impfungen klap-

pen. #SichereBildung #twlz #Schulen #ImpfenRettetLeben

Tokenized tweet [’Impfstoff’, ’Moderna’, ’Zulassung’, ’Kinder’, ’Jahre’, ’EMA’, ’beantragt’,

’gute’, ’Nachricht’, ’je’, ’mehr’, ’zugelassen’, ’umso’, ’schneller’, ’sollten’,

’Impfungen’, ’klappen’, ’SichereBildung’, ’Schulen’, ’ImpfenRettetLeben’]

Bigrammed tweet [’Impfstoff’, ’Moderna’, ’Zulassung’, ’Kinder’, ’Jahre’, ’EMA’, ’beantragt’,

’gute Nachricht’, ’je’, ’mehr’, ’zugelassen’, ’umso’, ’schneller’, ’sollten’,

’Impfungen’, ’klappen’, ’SichereBildung’, ’Schulen’, ’ImpfenRettetLeben’]

Lemmatized tweet [’Impfstoff’, ’Moderna’, ’Zulassung’, ’Kind’, ’Jahr’, ’ema’, ’beantragen’,

’Gute nachricht’, ’je’, ’mehr’, ’zulassen’, ’umso’, ’schnell’, ’sollen’, ’Imp-

fung’, ’Klappen’, ’Sicherebildung’, ’Schule’, ’Impfenrettetlebe’]

sub-strings, we build bigrammed tweets using mod-

els.phrases of the gensim Python library (

Reh

rek,

2022). This means ﬁnding sequences of two con-

tiguous words to analyze their relationships and

probabilities. These word pairs are combined into

one word divided by an underscore. Furthermore, we

apply the HanoverTagger (Wartena, 2019) to lemma-

tize the tweets. Thereby, the words are changed into

their basic forms. An example of the data preparation

process is shown in Table 1.

3.4 Modeling

For topic modeling, we use the LDA (Blei et al., 2003)

model, as it has already been applied in the context

of tweet topic analysis (Xue et al., 2020a; Xue et al.,

2020b). This algorithm considers each document (i.e.,

tweet) as a collection of latent topics and calculates

the weights of the topics within the document as well

as their probability of appearance over the whole cor-

pus (i.e., subset of data).

First, a vector called bag of words is generated for

each subset of data. It stores the words and their fre-

quency of each corpus.

Second, for each corpus, an optimal number of

topics is deﬁned by calculating the perplexity and the

coherence score (see Table 2). The goal is to ﬁnd

the number of topics with the lowest perplexity and

the highest coherence score at the same time. Hence,

we deﬁned the following optimal numbers of topics:

P1 = 4 topics, P2 = 3 topics, and P3 = 2 topics. For

each topic, the top 15 related keywords are listed

in descending order according to rank. To extract

the most dominant topic from each time period, we

calculate the topic weightage per tweet, and put it

in relation to the number of all the tweets within the

speciﬁc time period.

3.5 Analysis of the Topics

First, two of our project members manually analyze

the listed keywords to deﬁne an overall theme for

each topic, and we merge their results into a common

theme. We then use ChatGPT-3.5 to examine the

extracted keywords. All prompts are documented in

our research protocol (Weigand et al., 2024). This

additional approach is an audit for our manually

extracted themes.

4 RESULTS

To answer our research questions, we use the subsets

of data P1, P2, and P3, along with their individual

numbers of optimal topics. For each subset of data,

the LDA model returns the 15 highest weighted lem-

matized keywords for each topic. These keywords

provide insights into the content of each topic (see

Table 3).

Concerning RQ1 and the topics discussed before

the ﬁrst measures were taken, we investigate the

subset of data P1. For example, in P1, topic 1 (see

Table 3) appears as the dominant topic in n = 4, 168

tweets. However, topic 2 is dominant in n = 283

tweets, topic 3 in n = 516 tweets, and topic 4 in

n = 262 tweets. To make the keywords easier to

understand, we abstract them into themes based on

our understanding and with the help of ChatGPT (see

Table 4). Overall, the keywords of topic 1 indicate

that general idea exchange within the community

was in high demand before the COVID-19 pandemic.

ChatGPT summarizes this topic as education and

learning. We summarize the keywords of topic 2 as

dealing with new and up-to-date concepts for digital

education, while ChatGPT abstracts it as digital

transformation in education. Topic 3 we cluster in

a group related to school projects and additional

Analyzing Tweets Using Topic Modeling and ChatGPT: What We Can Learn About Teachers and Topics During COVID-19

Pandemic-Related School Closures

353

Table 2: Results of the perplexity and the coherence score for P1, P2, and P3 to deﬁne the optimal number (see the bold

values) of topics for each subset of data.

P1 P2 P3

topics perplexity coherence score perplexity coherence score perplexity coherence score

2 -8.973 0.276 -8.852 0.315 -9.049 0.365

3 -9.151 0.257 -8.995 0.323 -9.244 0.311

4 -9.294 0.332 -9.118 0.292 -9.385 0.297

5 -9.437 0.306 -9.242 0.29 -9.512 0.294

6 -9.571 0.33 -9.358 0.292 -9.634 0.293

7 -9.723 0.318 -9.486 0.303 -9.792 0.296

8 -9.913 0.405 -9.659 0.329 -9.969 0.35

9 -10.219 0.393 -9.884 0.335 -10.236 0.33

10 -10.598 0.426 -10.224 0.294 -10.599 0.338

11 -11.174 0.384 -10.718 0.358 -11.144 0.279

12 -11.998 0.441 -11.473 0.314 -11.981 0.308

Table 3: LDA results for all subsets of data (P1, P2, and P3) and their topics. Topic 1 appears as the dominant topic in P1 that

occurs most often. In P2, topic 3 is the dominant topic. In P3, the two topics are almost evenly distributed among the tweets.

Subset of data Topic Lemmatized keywords within topic

P1 1 Schule, geben, Mal, schon, jemand, Thema, danke, Frage, heute, amp, Sus,

Neue, Idee, mehr, viel

2 Unterricht, Digitalebildung, neu, digital, Arbeit, Lehrkraft, Bildung, erstellen,

Tolle, Edupnx, warum, statt, Zeit, m

ussen, Medium

3 Jahr, Bayernedu, erst, ﬁnden, gut, haben, gerne, Gute, wer, Klasse, Projekt,

dabei, einfach, Wunsch, vielleicht

4 Lehrerleben, Sch

uler, Lehrer, Sch

ulerin, immer, Lehrerin, freuen, sein, tipps,

kommen, gleich, Mensch, k

onnen, Podcast, letzt

P2 1 digital, Idee, viel, amp, Gute, Sch

uler, schon, Aufgabe, Frage, Unterricht,

gerne, jemand, Schulschließung, Lehrer, tipps

2 Coronaviru, Neue, m

ussen, immer, kommen, Server, Plattform, kostenlos,

Spiel, Kurs, letzt, schnell, Via, Geben, Twitter

3 Schule, Corona, Mal, Sus, Zeit, Online, heute, Schulschliessung, Lernen,

geben, Homeschooling, gerade, covid, gut, Sch

ulerin

P3 1 amp, Bildung, Thema, Unterricht, Uhr, Schule, geben, Bayernedu, digital,

freuen, Lernen, Online, Tool, Idee, Moodle

2 Schule, Mal, Test, Kind, Inzidenz, Sus, heute, mehr, schon, gut, Klasse,

Corona, Woche, sein, haben

wishes, especially in Bavaria, while ChatGPT labels

it as education initiatives and collaboration. From

our perspective, topic 4 is about teachers’ lives, while

ChatGPT summarizes it as teaching and learning

dynamics. ChatGPT describes P1 as education and

learning environments.

To evaluate RQ2 and the topics discussed during

the ﬁrst nationwide school closures, we examine the

subset of data P2. Three topics are identiﬁed for P2

(see Table 3). Topic 3 occurs as the dominant topic in

n = 11, 136 tweets. Topic 1 only occurs in n = 1, and

topic 2 is never the dominant topic. From our perspec-

tive, the keywords of the dominant topic 3 during the

ﬁrst nationwide school closures due to the COVID-

19 pandemic in March and April 2020 are related

to good online education during homeschooling and

school closures. ChatGPT summarizes this as Educa-

tion amidst the pandemic. Topic 1 relates to the search

for advice-related tasks in digital education during

school closures. ChatGPT abstracts this as education

in the digital age. Regarding topic 2, the keywords

imply discussion about (free) platforms and tools dur-

ing the COVID-19 pandemic. ChatGPT calls this

adapting to change in the pandemic era. ChatGPT de-

scribes P2 as adapting education in the face of crisis.

Regarding RQ3, we analyze the subset of data P3,

which contains the topics discussed in April and May

2021. The two topics of P3 (see Table 3) are relatively

even in terms of their distribution: Topic 1 is domi-

nant in n = 6, 319 tweets, and topic 2 is dominant in

n = 8, 166 tweets. We summarize the keywords of

topic 1 as digital education and tools, especially in

Bavaria, which ChatGPT labels as digital education

in Bavaria. Regarding the keywords of topic 2,

we ﬁnd the overall theme to be school life during

the COVID-19 pandemic inﬂuenced by the current

incidence levels, while ChatGPT characterizes it

as schooling during the pandemic: challenges and

adaptations. In summary, ChatGPT describes P3 as

digital education and pandemic adaptations.

WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies

354

Table 4: Themes of the LDA results deﬁned by the authors and ChatGPT per topic of the subsets of data P1, P2, and P3.

* CG = ChatGPT.

Overall CG* Subset of data Topic Theme deﬁned by authors Theme deﬁned by CG*

Education and

learning envi-

ronments

P1 1 General idea exchange within the com-

munity

Education and learning

2 New and up to date concepts for digital

education

Digital transformation in education

3 School projects and additional wishes,

especially in Bavaria

Education initiatives and collaboration

4 Teachers’ life Teaching and learning dynamics

Adapting edu-

cation in the

face of crisis

P2 1 Search for advice related tasks in digital

education during school closures

Education in the digital age

2 (Free) platforms and tools during the

COVID-19 pandemic

Adapting to change in the pandemic era

3 Good online education during home-

schooling and school closures

Education amidst the pandemic

Digital Educa-

tion and Pan-

demic Adap-

tations

P3 1 Digital education and tools, especially in

Bavaria

Digital education in Bavaria

2 School life during the COVID-19 pan-

demic inﬂuenced by the current inci-

dence levels

Schooling during the pandemic: chal-

lenges and adaptations

5 DISCUSSION

To discover the dominant topics within the #twlz

teacher community at different times during the

COVID-19 pandemic (RQ1 – RQ3), we examined

three Twitter subsets of data (P1, P2, P3). In addition,

we reﬂected on different methodologies used to

examine these research questions (RQ4).

For RQ1, we examined P1 to ﬁnd topics that

were discussed using the hashtags #twitterlehrerzim-

mer and #twlz before the COVID-19 pandemic in Jan-

uary and February 2020. From our understanding,

complemented by ChatGPT, the general exchange of

ideas about education and learning was prevalent in

the #twlz community at that time. There was also

a focus on digital transformation of the educational

environment. Education initiatives and collaboration,

especially in Bavaria, as well as teaching and learn-

ing dynamics, were also topics of discussion within

the community. The analysis of F

utterer et al. (2021)

also shows that digital education was discussed be-

fore the school closures in Germany and that edu-

cation and learning, classes and school life, and the

educational revolution and crisis were also common

themes. The results regarding the topics education

and learning and digital transformation of education

are nearly identical in the two analyses. In addition,

education initiatives and collaboration (in relation to

Bavaria) and teaching and learning dynamics may

be similar to classes and school life in the results of

utterer et al. (2021). Although our topic analysis did

not reveal any discussion related to educational rev-

olution or crisis, we only examined the four highest

weighted topics within P1, so it may have been that

this topic was just not considered.

The insights for RQ2 regarding the topics dis-

cussed using the hashtags #twitterlehrerzimmer and

#twlz during the ﬁrst nationwide school closures due

to the COVID-19 pandemic in March and April 2020

were extracted from our subset of data P2. In Ger-

many, different approaches were applied in differ-

ent regions due to the federal governance of edu-

cation (Huber, 2021). According to our ﬁndings,

complemented by ChatGPT, the discussion within the

#twlz community was still about digital education, but

it was focused more speciﬁcally on online education

in times of homeschooling and school closures due

to the COVID-19 pandemic. The discussion also in-

cluded an exchange on (free) platforms and tools to

adapt to the situation. F

utterer et al. (2021) also

found that digital education was an ongoing topic,

but their data set of topics also included speciﬁc soft-

ware and tools for teaching and learning and dis-

tance learning or homeschooling. Furthermore, they

found that mutual help was essential in these times,

though this is more of an implicit topic. They ex-

tracted the main challenges and opportunities through

a manual content analysis. They found the following

challenges: good digital classes, missing software,

and the lack of digital know-how for digital teaching.

Opportunities included networking and sharing pos-

sibilities, offering digital material, and explanations

or tricks. Concerning the second part of our RQ2

(whether there are obvious changes from the period

of January and February 2020), we determine that the

exchange within the #twlz community became more

speciﬁc during the COVID-19 pandemic. Online ed-

ucation became relevant overnight, and teachers were

expected to change how they taught. This caused

them to have more concrete questions about online

education and homeschooling and therefore look for

help, exchange, and tips within the #twlz community.

Analyzing Tweets Using Topic Modeling and ChatGPT: What We Can Learn About Teachers and Topics During COVID-19

Pandemic-Related School Closures

355

Regarding the second school closures in April

and May 2021 (RQ 3), we found two more or less

evenly distributed topics. Schools were affected by

the so-called “federal emergency brake” (German:

“Bundesnotbremse”), which limited in-classroom

teaching to schools in counties with COVID-19

incidence levels below 200 and then 165 in the rel-

ative county (Grill, M., Mascolo, G., Munzinger, P.,

Zick, T., 2022). According to our ﬁndings, the main

topics were digital education and tools, especially

in Bavaria, and school life during the pandemic.

Both were inﬂuenced by the ﬂuctuations in incidence

levels. Since F

utterer et al. (2021) published their

work in 2021, their work does not include this period.

In contrast to the period of the ﬁrst nationwide school

closures in Germany (March and April 2020, P2), the

topics discussed in relation to school life became even

more precise in terms of the requirements for school-

ing in times of short-term adjustments based on inci-

dence levels. Again, we have a reference to a speciﬁc

region (Bavaria), which may indicate that COVID-19

measures were especially strong in Bavaria during

this time, increasing the local exchange on the topic.

However, we can already see the Bavarian inﬂuence

before the pandemic in topic 3 of P1, so we assume

there is a strong #twlz community in Bavaria.

In terms of content, our ﬁndings show that the top-

ics did not change completely over time (from P1 to

P2 to P3). Before the COVID-19 pandemic, the ex-

change in the #twlz community was about education

and learning in general. Digital education played a

role, but it was not the dominant topic. Over time, the

exchange shifted toward digital education, the tools

needed, and how to adapt schooling to settings such as

homeschooling or short-term changes in educational

conditions. This is also underlined by the overall

themes of ChatGPT, which range from education and

learning environments (P1) to adapting education in

the face of crisis (P2) to digital education and pan-

demic adaptations (P3).

Regarding RQ4, we have the following insights: It

is not always possible to understand a complete data

set in a reasonable amount of time. Hence, techniques

such as ML are helpful for reducing the time, funding,

and personnel needed. Although the data come from

different sources, a comparison of our ﬁndings with

ChatGPT and the topics extracted by F

utterer et al.

(2021) for the period before and during the ﬁrst na-

tionwide school closures in Germany reveals no ma-

jor differences.

Given the time-sensitive nature of research

and the urgent need for results [e.g., health-

related analysis (Rauschenberger and Baeza-Yates,

2020)], using methods such as those suggested by

Mayring (Mayring, 2015) or the examination of de-

tailed bigram networks may not be ideal because these

methods are resource-intensive even for smaller data

sets. Since the outcomes are comparable in our case,

less time-critical methods (such as using the Chat-

GPT interface) may be preferable, especially when re-

sources are limited. In addition, since the results are

similar, ChatGPT can enhance the personnel’s point

of view with its objectivity. ChatGPT also summa-

rizes the topics in a shorter and more precise way, so

the combination of both perspectives can enrich the

result.

Our ﬁndings are limited by the fact that we do

not have the same raw Twitter data set as F

utterer et

al. (2021), which may occur due to deletion of user

accounts. Since ChatGPT can only handle a limited

amount of data input, we only used lemmatized key-

words within topics as input. Therefore, ChatGPT

only had a very small view of the data. In addition,

school closures in period P3 were not the same for

each region. This may have affected the urgency or

time users spent on Twitter in general. We did not

ﬁnd any major effect on the topics themselves but

rather on the number of tweets (P1: 5,229 tweets; P2:

11,137 tweets; P3: 14,485 tweets). Finally, there are

various biases in the data sets, and it is important to

consider that Twitter is not representative of the entire

population (Graells-Garrido et al., 2019). These bi-

ases must be acknowledged and addressed when uti-

lizing these insights for decision-making.

6 CONCLUSION

We conducted an analysis of our Twitter data set

from three distinct time periods (before school

closures, during the ﬁrst nationwide COVID-19

school closures in Germany, and during the second

school closures) within the #twitterlehrerzimmer

or #twlz community. We used ChatGPT to extract

themes and compared the outcomes with those of

a previous study. The results from various research

methodologies yielded similar insights regarding

the exchanges of teachers on Twitter. However, we

observed that ChatGPT provides comparable results

with greater ease of use and less effort.

The next step is to conduct a systematic analysis

comparing ML techniques to traditional manual

methods to explore their respective limitations in

content analysis, whether for small or large data

sets. Furthermore, the limitations of using ChatGPT

in terms of reliability and accuracy should be the

subject of further investigation.

WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies

356

ACKNOWLEDGMENTS

In our work, we used ChatGPT prompts as a

tool to abstract themes from speciﬁc keywords.

This research was supported by the EQUAVEL

project PID2022-137646OB-C31, funded by MI-

CIU/AEI/10.13039/501100011033 and by FEDER,

UE.

REFERENCES

Aarsen et al. (2023). Documentation Natural Lan-

guage Toolkit. Accessed July 3, 2023, from

https://www.nltk.org/.

Arya, N. (2022). TF-IDF Deﬁned. Accessed April 11, 2024,

from https://www.kdnuggets.com/2022/10/tﬁdf-

deﬁned.html.

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent

dirichlet allocation. Journal of Machine Learning Re-

search, 3(Jan):993–1022.

Braun, V. and Clarke, V. (2006). Using thematic analysis

in psychology. Qualitative Research in Psychology,

3(2):77–101. DOI: 10.1191/1478088706qp063oa.

Britannica, The Editors of Encyclopaedia (February 9,

2024). X. Accessed April 10, 2024, from

https://www.britannica.com/money/Twitter.

utterer, T., Hoch, E., St

urmer, K., Lachner, A., Fischer,

C., and Scheiter, K. (2021). Was bewegt Lehrperso-

nen w

ahrend der Schulschließungen? – Eine Anal-

yse der Kommunikation im Twitter-Lehrerzimmer

uber Chancen und Herausforderungen digitalen Un-

terrichts. Zeitschrift fur Erziehungswissenschaft : ZfE,

24(2):443–477. DOI: 10.1007/s11618-021-01013-8.

Graells-Garrido, E., Baeza-Yates, R., and Lalmas, M.

(2019). How representative is an abortion debate on

twitter? In Boldi, P., editor, Proceedings of the 10th

ACM Conference on Web Science, ACM Digital Li-

brary, pages 133–134, New York,NY,United States.

Association for Computing Machinery.

Grill, M., Mascolo, G., Munzinger, P., Zick, T.

(2022). Deutschlands Problemzone: Corona

und Schulen. Accessed April 14, 2024, from

https://www.sueddeutsche.de/projekte/artikel/politik/

corona-und-die-schulen-deutschlands-problemzone-

e671108/.

Huber, S. G. (2021). Schooling and Education in Times of

the COVID-19 Pandemic: Food for Thought and Re-

ﬂection Derived From Results of the School Barom-

eter in Germany, Austria and Switzerland. Interna-

tional Studies in Educational Administration (Com-

monwealth Council for Educational Administration &

Management (CCEAM)), 49(1).

Mayring, P. (2015). Qualitative Content Analysis: Theoreti-

cal Background and Procedures. Approaches to Qual-

itative Research in Mathematics Education, pages

365–380. DOI: 10.1007/978-94-017-9181-6

13.

Mohammad, S. M. and Turney, P. D. (2013). NRC emotion

lexicon. DOI: 10.4224/21270984.

OpenAI (2022). Introducing ChatGPT. Accessed April 15,

2024, from https://openai.com/blog/chatgpt.

Peffers, K., Tuunanen, T., Rothenberger, M. A., and Chat-

terjee, S. (2007). A Design Science Research Method-

ology for Information Systems Research. Journal of

Management Information Systems, 24(3):45–77. DOI:

10.2753/MIS0742-1222240302.

Rauschenberger, M. and Baeza-Yates, R. (2020). How

to Handle Health-Related Small Imbalanced Data in

Machine Learning? i-com, 19(3):215–226. DOI:

10.1515/icom-2020-0018.

Reh

rek, R. (December 21, 2022). GENSIM: Topic mod-

elling for humans. Accessed July 3, 2023, from

https://radimrehurek.com/gensim/index.html.

twarc (2024). twarc2. Accessed March 25,

2024, from https://twarc-project.readthedocs.io/

en/latest/twarc2 en us/.

Twitter (2023). Twitter API. Accessed July 8, 2023, from

https://developer.twitter.com/en/docs/twitter-api.

Wartena, C. (2019). A Probabilistic Morphology Model for

German Lemmatization. DOI: 10.25968/OPUS-1527.

Webb, H., Jirotka, M., Stahl, B. C., Housley, W., Edwards,

A., Williams, M., Procter, R., Rana, O., and Burnap,

P. (2017). The ethical challenges of publishing twitter

data for research dissemination. In Boldi, P., editor,

Proceedings of the 2017 ACM on Web Science Con-

ference, ACM Digital Library, pages 339–348, New

York, NY. ACM. DOI: 10.1145/3091478.3091489.

Weigand, A. C., Jacob, M. F., Rauschenberger, M., and

Escalona Cuaresma, M. J. (2024). Research Proto-

col for Analyzing Tweets Using Topic Modeling and

ChatGPT: What We Can Learn About Teachers and

Topics During COVID-19 Pandemic-Related School

Closures. DOI: 10.13140/RG.2.2.12205.91367.

Xue, J., Chen, J., Chen, C., Zheng, C., Li, S., and

Zhu, T. (2020a). Public discourse and sentiment

during the COVID 19 pandemic: Using Latent

Dirichlet Allocation for topic modeling on Twit-

ter. PloS one, 15(9):e0239441. DOI: 10.1371/jour-

nal.pone.0239441.

Xue, J., Chen, J., Hu, R., Chen, C., Zheng, C., Su, Y., and

Zhu, T. (2020b). Twitter Discussions and Emotions

About the COVID-19 Pandemic: Machine Learn-

ing Approach. Journal of medical Internet research,

22(11):e20550. DOI: 10.2196/20550.

Analyzing Tweets Using Topic Modeling and ChatGPT: What We Can Learn About Teachers and Topics During COVID-19

Pandemic-Related School Closures

357