Prompt Distillation for Emotion Analysis

Andrew L. Mackey, Susan Gauch and Israel Cuevas

Deparment of Electrical Engineering and Computer Science, University of Arkansas , Fayetteville, Arkansas, U.S.A.

{almackey, sgauch, ibcuevas}@uark.edu

Keywords:

Emotion Analysis, Natural Language Processing.

Abstract:

Emotion Analysis (EA) is a ﬁeld of study closely aligned with sentiment analysis whereby a discrete set of

emotions are extracted from a given document. Existing methods of EA have traditionally explored both

lexicon and machine learning techniques for this task. Recent advancements in large language models have

achieved success in a wide range of tasks, including language, images, speech, and videos. In this work, we

construct a model that applies knowledge distillation techniques to extract information from a large language

model which instructs a lightweight student model to improve its performance with the EA task. Speciﬁcally,

the teacher model, which is much larger in terms of parameters and training inputs, performs an analysis

of the document and shares this information with the student model to predict the target emotions for a given

document. Experimental results demonstrate the efﬁcacy of our proposed prompt-based knowledge distillation

approach for EA.

1 INTRODUCTION

Sentiment analysis (SA) is a prominent subﬁeld of

natural language processing (NLP) with the goal of

analyzing text documents from which the document’s

polarity is obtained. Emotion analysis (EA) estab-

lishes additional granularity for classes beyond polar-

ity from SA by focusing on the alignment of language

with various emotional categories. For example, the

Paul Ekman model for emotions deﬁnes six primary

emotion categories: anger, disgust, fear, joy, sadness,

and surprise (Ekman and Friesen, 1971). Another

approach to illustrate the various emotional dimen-

sions was proposed as the Robert Plutchik model with

eight primary bipolar emotions: anger versus fear, joy

versus sadness, anticipation versus surprise, and trust

versus disgust (Plutchik, 1982). Additional models

have been proposed that projects emotions into a di-

mensional space, such as for valence, arousal, and

dominance (Russell and Mehrabian, 1977).

Various techniques have been proposed for the

task of emotion analysis. The ﬁrst major area of emo-

tion analysis involves lexicon-based techniques where

the techniques are focused on aligning the emotional

categories of language with the speciﬁc words that

were used (Baccianella et al., 2010) (Staiano and

Guerini, 2014). The next major area of emotion anal-

ysis includes various machine learning techniques

that discover latent patterns or representations for the

detection of different emotional categories (Agrawal

and An, 2012) (Calefato et al., 2018) (Hasan et al.,

2019). Some researchers have investigated emotion

representations that seek to achieve emotion represen-

tations that transcend multiple lexicons and datasets

(Buechel et al., 2020). Some work in emotion clas-

siﬁcation has concentrated on aligning transformer-

based architectures with emotional categories through

deep contextual representations. Pretrained language

models (PLM) have demonstrated various successes

in outperforming many state-of-the-art techniques in

the ﬁeld. As the parameters and training data contin-

ued to scale for PLMs, large language models (LLM)

emerged and demonstrated capabilities not seen in

prior work, such as prompt-based learning and rea-

soning.

In this paper, we introduce a prompt-based knowl-

edge distillation model for emotion analysis where the

prompt serves as source of knowledge through which

we distill that information for a student model un-

der the supervision of a much larger teacher model.

The ﬁrst phase of our model involves a prompt-

based teacher model followed by a knowledge distil-

lation student training model. The teacher model uses

prompt-based techniques to extract information from

the LLM. The student model uses a transformer-based

PLM where probabilities from both teacher and stu-

dent models are aligned so that the student model is

capable of generating similar probability distributions

as the teacher model.

328

Mackey, A., Gauch, S. and Cuevas, I.

Prompt Distillation for Emotion Analysis.

DOI: 10.5220/0012951200003838

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2024) - Volume 1: KDIR, pages 328-334

ISBN: 978-989-758-716-0; ISSN: 2184-3228

Figure 1: Overview architecture of the model. We combine a pre-trained language model with a large language model to

extract the emotion embeddings cross-corpus to perform a classiﬁcation of the emotions. For the ﬁnal prediction y, we

localize the classiﬁcation head to a set of possible classes for the respective datatset.

2 RELATED WORK

Recent work in the research community has focused

on tasks involving emotion analysis has concentrated

primarily on PLMs for learning contextual represen-

tations using neural networks (Demszky et al., 2020)

(Turcan et al., 2021) (Alhuzali and Ananiadou, 2021)

(Wullach et al., 2021) (Mackey et al., 2021) (Tora-

man et al., 2022) (Rahman et al., 2024). PLMs un-

dergo various training methods which enables them to

learn latent contextual representations of text. These

models are generally ﬁne-tuned in order to adapt to

task-speciﬁc objectives, such as emotion classiﬁca-

tion. Bidirectional Encoder Representations from

Transformers (BERT) is a transformer-based archi-

tecture that bidirectionally encoded embeddings to

learn contextual information in textual data where the

model was pre-trained simultaneously on the tasks

of masked language modeling (MLM) and next sen-

tence prediction (NSP) (Devlin et al., 2019). XL-

Net improves upon BERT by introducing permuta-

tion language modeling where tokens are predicted

in a random order (Yang et al., 2019). RoBERTa

improved upon BERT by modifying the training ap-

proach where the NSP task was removed and dynamic

masking was introduced, and increasing the amount

of training data that was used (Liu et al., 2019).

Other work with LMs has resulted in different

techniques for training methodologies. Knowledge

distillation techniques, where a teacher model trans-

fers knowledge from a complex model to a much sim-

pler model, how shown promising results across dif-

ferent studies (Hinton et al., 2015) (Lukasik et al.,

2022). Brown et al. deﬁne various levels of data used

for in-context learning, such as ﬁne-tuning (updating

weights of a pretrained model), few-shot (models are

provided a few demonstrations of a task with no addi-

tional weight updates to the model), one-shot (only

one demonstration is permitted), and zero-shot (no

demonstrations are permitted) (Brown et al., 2020).

Brown et al. also demonstrate that as LMs increase in

scale, their task-agnostic few-shot performance also

increases (Brown et al., 2020). In addition, Halder

et al. acknowledged that tranformer-based LMs ﬁne-

tuned to task-speciﬁc objectives curtail their ability to

perform well in zero-shot, one-shot, or few-shot sce-

narios (Halder et al., 2020).

Work involving large language models continues

to demonstrate their task-agnostic capabilities. One

study demonstrated a technique of applying a series

of reasoning steps named chain of thought where

an LLM utilized chain-of-thought prompting that

demonstrated reasoning abilities provided the LLM is

adequately large (Wei et al., 2022). Adversarial dis-

tillation frameworks have also been proposed in re-

search literature for improved knowledge distillation

and transfer learning (Jiang et al., 2023).

3 PROBLEM DEFINITION

Let D represent a dataset comprised of N documents,

where each document in D consists of textual infor-

mation and emotion labels. We observe the following

for each D: (1) the set of text documents in dataset

D is represented as X

such that |X

| = N; (2) the set

of possible target labels for dataset D is represented

as Y

where |Y

| = C different emotions; and (3) D

is represented as the following set in the single-label

Prompt Distillation for Emotion Analysis

329

setting:

D = {(x, y) | x ∈ X

and y ∈ Y

} (1)

and the following serves as the representation for a

multi-label setting:

D = {(x, y) | x ∈ X

and y ∈ P (Y

)} (2)

Let D represent the input text corpora where D =

, D

, ..., D

}. The task presented in this work is

to train and align a model to recognize the latent emo-

tion representations in a cross-corpus setting using D

for the purpose of single-class and multi-class emo-

tion classiﬁcation of an emotion label (or set) y from

a given input document x:

y = arg max



Pr(y = c | x; Θ)



(3)

4 PROPOSED APPROACH

We present our proposed solution in this section for

the single-class and multi-class cross-corpora emo-

tion classiﬁcation task. In Figure 2, we provide

an overview of our framework for learning the la-

tent emotion distribution of text documents. There

are three major components to our approach: (1) a

prompt-based knowledge distillation paradigm for ex-

tracting information from an LLM to facilitate the

alignment of a task-speciﬁc model; (2) a task-speciﬁc,

emotion classiﬁcation model that leverages a pre-

trained, transformer-based language model, which is

ﬁne-tuned for the emotion classiﬁcation task; and (3)

a cross-corpora framework for learning latent emotion

representations.

4.1 Prompt-Based Methodology

For a given dataset D = (X

, Y

), each input

and target is represented as (x

doc

, y

emo

) such that

doc

, y

emo

) ∈ D. The target y

emo

of the model is the

emotion class for each document where y

emo

∈ Y

(i.e. anger, grief, disgust, etc.) in the respective

dataset D. To facilitate knowledge distillation from

an LLM, we deﬁne (x

prompt

, y

llm

) to represent the

prompt-based input and label generated from an LLM

for each (x

doc

, y

emo

) ∈ D.

Prompt Template. The following template is used

to the generate each x

prompt

You will be given a human written sentence. Classify

the sentence into one of the following categories:

⟨y

emo

, y

emo

, ...⟩. Return the following format only for

each category as a probability distribution (the sum

should be 1): ⟨y

emo

, probability⟩.

The following is the document: x

The target y

llm

represents the emotion distribution

produced by the LLM for the given input prompt

prompt

, which is modeled as follows:

llm

= Pr(y

emo

| x

prompt

) (4)

= LLM(x

prompt

) (5)

Hallucinations are a known problem in research

literature where an LLM produces a response that is

either factually incorrect or unaligned with the input

prompt it was provided (Farquhar et al., 2024). To ad-

dress the problem of hallucinations, we conduct a val-

idation step for

llm

to ensure the format of the output

is aligned with the targets in the training data. Docu-

ments failing the validation step will undergo a ﬁxed

interval of reprompting where the input and interac-

tions are returned to the LLM for further processing

in the form:

llm

′

= LLM( ⟨x

prompt

′

, ⟨x

prompt

, y

llm

⟩⟩ ) (6)

4.2 Emotion Classiﬁcation Model

The task-speciﬁc emotion classiﬁcation model begins

by employing the use of a transformer-based language

model to provide contextual representations h

emo

⟨h

emo

, h

emo

, ..., h

emo

⟩ for input tokens x

doc

where k

represents the number of time steps. The transformer-

based encoder LM is parameterized with φ for all

datasets D ∈ D to generate the contextualized word

representations h

emo

for each time step i:

emo

= LM

doc

) (7)

The last layer of h

emo

is used to compute the dis-

tribution for the emotion classes, where it is param-

eterized by φ

for each D

∈ D to obtain the target

prediction distribution

emo

and the softmax layer is

applied to normalize the logits:

emo

= Pr(y

emo

| h

emo

) (8)

= Softmax(W

emo

+ b

) (9)

The model shares a common set of parameters φ

between all members of D to facilitate latent emo-

tion representation learning in a cross-domain envi-

ronment, while the task-speciﬁc classiﬁcation head

maintains a speciﬁc set of a parameters φ

4.3 Knowledge Distillation

The goal of a prompt-based teacher model is to extract

knowledge from an LLM and transfer it to the task-

speciﬁc student model, which is responsible for ﬁne-

grained emotion classiﬁcation. The prompt-based

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

330

model instructs the emotion classiﬁcation model to

enable the smaller model to generalize in a manner

that resembles the teacher model. The student model

minimizes a loss function which focuses on both cor-

rectly predicting the target label y

emo

while simulta-

neously aligning the model with the teacher model’s

responses y

llm

The model utilizes a cross-entropy loss function

for the single-class emotion classiﬁcation task

= −

∑

emo

log( ˆy

emo

) (10)

and a binary cross-entropy loss function for multi-

class emotion classiﬁcation

BCE

= −

∑



emo

log( ˆy

emo

) + (1 − y

emo

)log(1 − ˆy

emo

)



(11)

for when there exists multiple emotion labels for a

given document.

We use τ to represent the temperature rate hyper-

parameter to produce a softer probability distribution

over all possible classes for class imbalances through

knowledge distillation techniques. For these models,

the losses from the emotion detection task and the

prompt-based alignment model are summed together

after each batch by using the adjustable hyperparam-

eter λ, which balances the terms below:

= λL

emo

+ (1 − λ)τ

llm

(12)

5 EXPERIMENTS

In this section, we provide an empirical analysis of

our proposed model and investigate the following re-

search questions:

• RQ1: What is the effectiveness of the proposed

model for the emotion classiﬁcation task in terms

of model performance metrics?

• RQ2: Does the choice of LM contribute to the

performance of the proposed model?

• RQ3: How does the knowledge distillation from

an LLM to the proposed model contribute to the

overall performance?

5.1 Data

Our experiments are conducted on two benchmark

datasets: WASSA-21 dataset and Real World Worry

dataset (Buechel et al., 2018) (Kleinberg et al.,

2020). The WASSA-21 dataset was provided in

the 11th Workshop on Computational Approaches

Figure 2: Distribution of the emotion labels by dataset. The

RWW dataset emphasized fear and sadness labels. The

GoEmotions dataset had a stronger presence of documents

labeled as neutral and joy. The WASSA dataset contained

more labels with the sadness and surprise labels in compar-

ison to other datasets.

to Subjectivity, Sentiment, and Social Media Analy-

sis (WASSA) Shared Task: Empathy Detection and

Emotion Classiﬁcation (Tafreshi et al., 2021). The

dataset consists of n = 1860 reactions to news stories

indicating that there is harm to a person, group, or

other. The labels for each record are mapped to seven

emotion categories, which include a neutral category

and Ekman’s basic emotion categories: anger, dis-

gust, fear, joy, sadness, and surprise. This label rep-

resents the dominant emotion for the text.

Table 1: GoEmotions emotion mapping to Ekman emo-

tions.

Emotion Association

anger anger, annoyance, disapproval

disgust disgust

fear fear, nervousness

joy joy, amusement, approval, ex-

citement, gratitude, love, opti-

mism, relief, pride, admiration,

desire, caring

sadness sadness, disappointment, em-

barrassment, grief, remorse

surprise surprise, realization, confusion,

curiosity

The second dataset used in our experiments is

the COVID-19 Real World Worry dataset (Kleinberg

Prompt Distillation for Emotion Analysis

331

Table 2: COVID-19 emotion mapping to Ekman emotions.

Emotion Association

anger anger

disgust disgust

fear fear, anxiety

joy happiness, relaxation

sadness sadness

surprise desire

et al., 2020). The dataset contains n = 2491 records

that were extracted by surveying participants and ask

them to express their emotional feelings towards the

COVID-19 pandemic. Participants were asked to con-

struct two different forms of text. The ﬁrst document

they were asked to author included instructions to ex-

press their feelings towards the then current COVID-

19 situation with a minimum of 500 characters. The

second document expressed them to convey the same

feelings in the form of a social media post that had a

maximum of 240 characters. Participants were asked

to rate their emotions toward the situation and select

one of the following emotions that best represented

their feelings: anger, anxiety, desire, disgust, fear,

happiness, relaxation, and sadness. We used the emo-

tion deﬁnitions from (Demszky et al., 2020) as indi-

cated in Table 1 to map perform the emotion map-

pings as indicated in Table 2.

5.2 Baseline Experiments

To evaluate the efﬁcacy of our proposed prompt-based

knowledge distillation model, we use PLMs as the

baseline for our experiments. We benchmark our

model using the BERT, RoBERTa, and XLNet PLMs

where the input will only be the document and tar-

get emotion(s). We evaluate the model performance

of each dataset and report the mean precision, recall,

and F

-scores after 3 runs using macro averaging.

5.3 Experimental Settings

Our model was constructed using the PyTorch frame-

work along with the HuggingFace transformers li-

brary for the pretrained language model implementa-

tions.

We followed similar experimental settings as

provided in (Demszky et al., 2020). Our model uses

the AdamW optimizer (Loshchilov and Hutter, 2017)

while setting the learning rate to 5e

−5

, batch size to

16, and maximum sequence length of 512. Since

previous research literature demonstrated overﬁtting

beyond four epochs, we limited our the number of

epochs during the ﬁne-tuning step to four (Demszky

https://huggingface.co/docs/transformers/en/index

et al., 2020). For the large language model, we uti-

lized the GPT-4o model provided through the API.

5.4 Experimental Results

Table 3 reﬂects the results from the experiments con-

ducted in this paper. Each experiment was executed

independently of other datasets. The best results are

indicated in bold. As reﬂected in the results, our

method is able to demonstrate increased performance

above the baseline methods for the WASSA-21 and

RWW datasets. This demonstrates that the PLM ac-

quires additional knowledge through transfer learn-

ing and knowledge distillation through this technique

that it did not acquire through the data alone. Fur-

thermore, we also discover that the RoBERTa PLM

is able to achieve superior performance over the other

PLMs evaluated in the tests we conducted. Despite

the extreme differences in the distribution of the la-

bels between the datasets as evidenced in Figure 2, we

observe that the proposed technique is able to work

given the task-agnostic knowledge provided from the

teacher model. When RoBERTa was used as the

underlying PLM for our technique, we were able

to achieve a gain of ∆ = +2.18 increase in perfor-

mance for the F

score for the WASSA-21 dataset and

∆ = +1.86 for the RWW dataset.

It should also be noted that the largest gain in

performance was achieved through the prompt-based

knowledge distillation approach with the BERT PLM

in the RWW dataset. We observe an increase of

∆ = +2.37 in the F

score under these settings.

6 CONCLUSIONS

Throughout our work in this paper, we investigated

the task of emotion analysis under a prompt-based

knowledge distillation setting where we trained a stu-

dent model by aligning it with a teacher model which

provides instruction on how to generate similar proba-

bility distributions in a task-speciﬁc objective. Future

directions for this work can involve exploring other

techniques, such as chain-of-thought or other reason-

ing approaches, or augmented LLM approaches to

improve the teacher model through prompting strate-

gies. The proposed methodology can be extended to

consider additional modalities of data.

REFERENCES

Agrawal, A. and An, A. (2012). Unsupervised emotion de-

tection from text using semantic and syntactic rela-

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

332

Table 3: Comparison of baselines with experimental settings. Our proposed prompt-based knowledge distillation models

outperform the baseline models.

Type Model

WASSA-21 RWW

Precision Recall F1 Precision Recall F1

Baseline

BERT 68.52 68.67 67.70 18.81 19.33 18.80

RoBERTa 72.39 73.84 71.74 23.20 21.97 20.61

XLNet 60.74 63.18 60.92 20.40 20.26 18.52

Experiment

BERT+PKD 69.02 71.15 68.58 23.52 23.32 21.17

RoBERTa+PKD 73.85 75.16 73.92 24.63 22.69 22.47

XLNet+PKD 62.55 64.29 61.75 22.52 21.41 19.28

∆ Change +1.46 +1.32 +2.18 +1.43 +0.72 +1.86

tions. In 2012 IEEE/WIC/ACM International Confer-

ences on Web Intelligence and Intelligent Agent Tech-

nology, volume 1, pages 346–353.

Alhuzali, H. and Ananiadou, S. (2021). SpanEmo: Casting

multi-label emotion classiﬁcation as span-prediction.

In Merlo, P., Tiedemann, J., and Tsarfaty, R., edi-

tors, Proceedings of the 16th Conference of the Eu-

ropean Chapter of the Association for Computational

Linguistics: Main Volume, pages 1573–1584, Online.

Association for Computational Linguistics.

Baccianella, S., Esuli, A., and Sebastiani, F. (2010). Senti-

WordNet 3.0: An enhanced lexical resource for senti-

ment analysis and opinion mining. In Calzolari, N.,

Choukri, K., Maegaard, B., Mariani, J., Odijk, J.,

Piperidis, S., Rosner, M., and Tapias, D., editors, Pro-

ceedings of the Seventh International Conference on

Language Resources and Evaluation (LREC’10), Val-

letta, Malta. European Language Resources Associa-

tion (ELRA).

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.,

Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,

Askell, A., Agarwal, S., Herbert-Voss, A., Krueger,

G., Henighan, T., Child, R., Ramesh, A., Ziegler,

D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler,

E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner,

C., McCandlish, S., Radford, A., Sutskever, I., and

Amodei, D. (2020). Language models are few-shot

learners.

Buechel, S., Buffone, A., Slaff, B., Ungar, L., and Sedoc, J.

(2018). Modeling empathy and distress in reaction to

news stories. In Proceedings of the 2018 Conference

on Empirical Methods in Natural Language Process-

ing. Association for Computational Linguistics.

Buechel, S., Modersohn, L., and Hahn, U. (2020). Towards

label-agnostic emotion embeddings.

Calefato, F., Lanubile, F., and Novielli, N. (2018). Emotxt:

A toolkit for emotion recognition from text.

Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A.,

Nemade, G., and Ravi, S. (2020). Goemotions: A

dataset of ﬁne-grained emotions. In Proceedings of

the 58th Annual Meeting of the Association for Com-

putational Linguistics. Association for Computational

Linguistics.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2019). Bert: Pre-training of deep bidirectional trans-

formers for language understanding. In Proceedings

of the 2019 Conference of the North. Association for

Computational Linguistics.

Ekman, P. and Friesen, W. V. (1971). Constants across cul-

tures in the face and emotion. Journal of personality

and social psychology, 17(2):124.

Farquhar, S., Kossen, J., Kuhn, L., and Gal, Y. (2024). De-

tecting hallucinations in large language models using

semantic entropy. Nature, 630(8017):625–630.

Halder, K., Akbik, A., Krapac, J., and Vollgraf, R. (2020).

Task-aware representation of sentences for generic

text classiﬁcation. In Scott, D., Bel, N., and Zong, C.,

editors, Proceedings of the 28th International Confer-

ence on Computational Linguistics, pages 3202–3213,

Barcelona, Spain (Online). International Committee

on Computational Linguistics.

Hasan, M., Rundensteiner, E., and Agu, E. (2019). Auto-

matic emotion detection in text streams by analyzing

twitter data. International Journal of Data Science

and Analytics, 7.

Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling

the knowledge in a neural network. In NIPS Deep

Learning and Representation Learning Workshop.

Jiang, Y., Chan, C., Chen, M., and Wang, W. (2023). Lion:

Adversarial distillation of proprietary large language

models. In The 2023 Conference on Empirical Meth-

ods in Natural Language Processing.

Kleinberg, B., van der Vegt, I., and Mozes, M. (2020).

Measuring emotions in the covid-19 real world worry

dataset.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,

Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,

V. (2019). Roberta: A robustly optimized bert pre-

training approach.

Loshchilov, I. and Hutter, F. (2017). Decoupled weight

decay regularization. In International Conference on

Learning Representations.

Lukasik, M., Bhojanapalli, S., Menon, A. K., and Kumar, S.

(2022). Teacher’s pet: understanding and mitigating

biases in distillation. Transactions on Machine Learn-

ing Research.

Prompt Distillation for Emotion Analysis

333

Mackey, A., Gauch, S., and Labille, K. (2021). Detecting

fake news through emotion analysis.

Plutchik, R. (1982). A psychoevolutionary theory of emo-

tions.

Rahman, A. B. S., Ta, H.-T., Najjar, L., Azadmanesh,

A., and G

ul, A. S. (2024). Depressionemo: A

novel dataset for multilabel classiﬁcation of depres-

sion emotions.

Russell, J. A. and Mehrabian, A. (1977). Evidence for a

three-factor theory of emotions. Journal of research

in Personality, 11(3):273–294.

Staiano, J. and Guerini, M. (2014). Depeche mood: a lexi-

con for emotion analysis from crowd annotated news.

In Proceedings of the 52nd Annual Meeting of the As-

sociation for Computational Linguistics (Volume 2:

Short Papers), volume 2, pages 427–433.

Tafreshi, S., De Clercq, O., Barriere, V., Buechel, S., Sedoc,

J., and Balahur, A. (2021). WASSA 2021 shared task:

Predicting empathy and emotion in reaction to news

stories. In De Clercq, O., Balahur, A., Sedoc, J., Bar-

riere, V., Tafreshi, S., Buechel, S., and Hoste, V., ed-

itors, Proceedings of the Eleventh Workshop on Com-

putational Approaches to Subjectivity, Sentiment and

Social Media Analysis, pages 92–104, Online. Asso-

ciation for Computational Linguistics.

Toraman, C., S¸ ahinuc¸, F., and Yilmaz, E. (2022). Large-

scale hate speech detection with cross-domain trans-

fer. In Calzolari, N., B

echet, F., Blache, P., Choukri,

K., Cieri, C., Declerck, T., Goggi, S., Isahara, H.,

Maegaard, B., Mariani, J., Mazo, H., Odijk, J.,

and Piperidis, S., editors, Proceedings of the Thir-

teenth Language Resources and Evaluation Confer-

ence, pages 2215–2225, Marseille, France. European

Language Resources Association.

Turcan, E., Muresan, S., and McKeown, K. (2021).

Emotion-infused models for explainable psychologi-

cal stress detection. In Proceedings of the 2021 Con-

ference of the North American Chapter of the Associa-

tion for Computational Linguistics: Human Language

Technologies. Association for Computational Linguis-

tics.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B.,

Xia, F., Chi, E., Le, Q., and Zhou, D. (2022). Chain-

of-thought prompting elicits reasoning in large lan-

guage models.

Wullach, T., Adler, A., and Minkov, E. (2021). Fight ﬁre

with ﬁre: Fine-tuning hate detectors using large sam-

ples of generated hate speech. In Findings of the Asso-

ciation for Computational Linguistics: EMNLP 2021.

Association for Computational Linguistics.

Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov,

R. R., and Le, Q. V. (2019). Xlnet: Generalized

autoregressive pretraining for language understand-

ing. In Wallach, H., Larochelle, H., Beygelzimer,

A., d'Alch

e-Buc, F., Fox, E., and Garnett, R., editors,

Advances in Neural Information Processing Systems,

volume 32. Curran Associates, Inc.

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

334