Optimizing High-Dimensional Text Embeddings in Emotion

Identiﬁcation: A Sliding Window Approach

Hande Aka Uymaz

and Senem Kumova Metin

Izmir University of Economics, Department of Software Engineering,

Izmir, Turkey

{hande.aka, senem.kumova}@ieu.edu.tr

Keywords:

Natural Language Processing, Emotion, Large Language Models, Vector Space Models.

Abstract:

Natural language processing (NLP) is an interdisciplinary ﬁeld that enables machines to understand and gen-

erate human language. One of the crucial steps in several NLP tasks, such as emotion and sentiment analysis,

text similarity, summarization, and classiﬁcation, is transforming textual data sources into numerical form, a

process called vectorization. This process can be grouped into traditional, semantic, and contextual vector-

ization methods. Despite their advantages, these high-dimensional vectors pose memory and computational

challenges. To address these issues, we employed a sliding window technique to partition high-dimensional

vectors, aiming not only to enhance computational efﬁciency but also to detect emotional information within

speciﬁc vector dimensions. Our experiments utilized emotion lexicon words and emotionally labeled sen-

tences in both English and Turkish. By systematically analyzing the vectors, we identiﬁed consistent patterns

with emotional clues. Our ﬁndings suggest that focusing on speciﬁc sub-vectors rather than entire high-

dimensional BERT vectors can capture emotional information effectively, without performance loss. With this

approach, we examined an increase in pairwise cosine similarity scores within emotion categories when using

only sub-vectors. The results highlight the potential of the use of sub-vector techniques, offering insights into

the nuanced integration of emotions in language and the applicability of these methods across different lan-

guages.

1 INTRODUCTION

Natural language processing (NLP) is a ﬁeld at the in-

tersection of computer science, artiﬁcial intelligence,

and linguistics that aims to enable machines to un-

derstand and generate human language. In text-based

natural language processing, the ﬁrst step is to convert

the given textual content into a numerical format that

computers can process. These numerical represen-

tations are expected to reﬂect the complex elements

of language, including grammatical rules, vocabulary,

and various linguistic components. In the ﬁeld, the

process of converting textual data into numerical rep-

resentations is commonly referred to as vectorization.

The combined representation of documents within a

common vector space is known as the vector space

model (Manning et al., 2008). This model, which is

grounded in linear algebra, allows for vector-based

operations like addition, subtraction, and similarity

calculations.

We can examine vectorization methods in three

https://orcid.org/0000-0002-3535-3696

https://orcid.org/0000-0002-9606-3625

groups: traditional (i.e., one-hot encoding, TF, IDF),

semantic (i.e., Word2Vec (Mikolov et al., 2013)

and GloVe (Global Vectors for Word Representation)

(Pennington et al., 2014)), and contextual (i.e., BERT

(Bidirectional Encoder Representations from Trans-

formers) (Devlin et al., 2018), GPT (Generative pre-

trained transformers) (OpenAI, 2023), ELECTRA

(Clark et al., 2020)) methods. Traditional methods

represent words as discrete, sparse vectors without

capturing semantic meaning. Semantic methods gen-

erate dense vectors that are designed to capture se-

mantics but fail to account for word polysemy. Con-

textual methods create vectors that vary with context,

capturing deeper semantics and polysemy informa-

tion. Considering the problems with traditional meth-

ods, such as the increased computational demand as

the number of existing words increases and the lack

of semantic information, or in semantic vectors, the

neglect of polysemy information and having a sin-

gle vector for each word independent of its context

in a sentence, recently, contextual vectors are more

frequently used in NLP problems and achieve better

success.

258

Aka Uymaz, H. and Kumova Metin, S.

Optimizing High-Dimensional Text Embeddings in Emotion Identiﬁcation: A Sliding Window Approach.

DOI: 10.5220/0012899300003838

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2024) - Volume 1: KDIR, pages 258-266

ISBN: 978-989-758-716-0; ISSN: 2184-3228

Unlike static word embeddings, models such as

ELMO (Peters et al., 2018), BERT (Devlin et al.,

2018), and DistilBERT (Sanh et al., 2019) produce

embeddings that consider the word sense and poly-

semy by adapting to the speciﬁc context in which

a word is used. ELMO employs a bi-directional

long short-term memory architecture to create mul-

tiple vectors for words in different contexts, enhanc-

ing tasks such as question answering and sentiment

detection. BERT, introduced by Google, utilizes a

multi-layer bidirectional transformer encoder and a

masked language model approach, showing perfor-

mance in various NLP applications through transfer

learning. BERT’s signiﬁcant potential and perfor-

mance have led to the development of efﬁcient vari-

ants such as RoBERTa (Liu et al., 2019), ALBERT

(Lan et al., 2019), and DistilBERT (Sanh et al., 2019).

Beyond BERT-based models, approaches like ULM-

Fit and XLNet have also shown promising results in

tasks like sentiment and emotion analysis, further di-

versifying the landscape of contextual embeddings in

NLP.

The vectors created to represent any text unit are

high-dimensional vectors (e.g., the vectors produced

from BERT-base and BERT-large models have dimen-

sions of 768 and 1024, respectively). When per-

forming classiﬁcation, measuring similarity, and/or

running other procedures employing these high-

dimensional vectors, they can lead to signiﬁcant

memory and computational costs, especially when

working with large datasets. Furthermore, feature

engineering holds great importance in classiﬁcation

problems. Although high-dimensional vectors carry

detailed information, not all dimensions may be nec-

essary in the solution of a speciﬁc problem. Elimi-

nating irrelevant or low-information features can im-

prove the model’s performance and prevent over-

ﬁtting. Additionally, feature selection can reduce the

computational costs and memory requirements of the

model, providing a signiﬁcant advantage. In this con-

text, we investigated the following 3 research ques-

tions (RQ) for this study:

RQ1. How can we enhance the effectiveness of

vector representations by optimizing computational

efﬁciency?

Our goal was to tackle the computational chal-

lenges associated with high-dimensional vectors, par-

ticularly when handling large datasets. By employ-

ing a sliding window method, we systematically ex-

amined recurring patterns within these vectors to en-

hance computational efﬁciency.

RQ2. Can we have insights into the nuanced inte-

gration of emotions within language representations

of text units?

As detailed in Section 3, we investigated

whether the method we applied to BERT vectors

of words/sentences labeled with different emotions

could detect emotional information in speciﬁc parts

of the vector representations.

RQ3. What are the differences or similarities be-

tween the application of an optimization approach on

vectors in the English and Turkish languages?

In the literature, while many methods used in the

ﬁeld of NLP on texts demonstrate success in the En-

glish language, it is observed that the same method

may not yield the same success or effects when ap-

plied to different languages. Therefore, both for this

reason and to make comparisons, we conducted ex-

periments for the proposed method in both English

and Turkish languages. The reason for choosing

Turkish as a second language is that it differs signif-

icantly from English in terms of grammar. Among

the general features of Turkish, its agglutinative struc-

ture, vowel harmony, and frequent usage of idioms

and proverbs can be counted. For example, the 22-

letter Turkish word “Anlamlandıramadıklarım.” can

be expressed in English as the 6-word sentence “What

I couldn’t make sense of.”.

In summary, we examined whether certain dimen-

sions within the representations of text units might in-

clude concealed information, such as emotions. This

led us to explore the possibility of detecting emotional

cues through a detailed analysis of these dimensions.

To achieve this goal, we employed a sliding window

approach to partition vectors and identify consistent

patterns, aiming to enhance computational efﬁciency

and gain a deeper understanding of the integration of

emotions within these vectors. Our experiments in-

volve emotion lexicon words and emotionally labeled

sentences, and we also utilized BERT as an embed-

ding model. Ultimately, this approach, which offers a

new perspective on emotional representation, can be

applied to any text unit, any embedding model, and

any hidden information that can be detected. The con-

tributions of the study can be listed as follows:

1. A dimensionality reduction technique through a

sliding window approach is introduced to parti-

tion high-dimensional vector representations of

texts into smaller sub-vectors, improving compu-

tational efﬁciency while maintaining or enhancing

the effectiveness of representations.

2. Speciﬁc sub-vectors within BERT vectors that

contain emotional information have been identi-

ﬁed, suggesting that emotional clues are localized

within certain dimensions of the vectors.

3. Experiments utilizing only sub-vectors are con-

ducted in both English and Turkish, demonstrat-

ing the effectiveness of the proposed method for

Optimizing High-Dimensional Text Embeddings in Emotion Identiﬁcation: A Sliding Window Approach

259

two languages with different grammatical struc-

tures.

In the subsequent sections of the paper, Section

2 provides a literature review, Section 3 details the

proposed method, Section 4 presents the experiments

and results, and Section 5 concludes with the ﬁndings

and implications.

2 LITERATURE REVIEW

Vector space models refer to the numerical represen-

tation of text units (like words or phrases) in a vector

space. As can be seen in Figure 1, the models can be

considered in two different groups: context-free and

contextual models.

Figure 1: Vector space models.

From the context-free models, traditional models

like one-hot encoding, tf-idf, and co-occurrence ma-

trix representation lack semantic understanding. For

instance, co-occurrence matrix representation counts

word occurrences but fails to capture the nuances of

word meanings and their semantic associations. Thus,

these models struggle to comprehend the deeper

meaning and context of language, which brings a

drawback in tasks requiring semantic understanding,

such as sentiment analysis and language translation.

Semantic embeddings like Word2Vec and GloVe pro-

vide the representation of words with similar mean-

ings close together in vector space. Capturing seman-

tic relationships between words helps these models

manage tasks like semantic similarity and word anal-

ogy. Although they have been a signiﬁcant innovation

in the ﬁeld of NLP for containing semantic informa-

tion, these models generate only a single static vector

for each word. In other words, these models that pro-

duce context-free vectors do not consider polysemy

and content.

Contextual models like BERT and ELMO produce

different embeddings based on the context in which

they are used, even for the same words with different

meanings. These contextual models are designed to

capture nuanced information in language and repre-

sent the complex relationships between words in var-

ious contexts. The representations are based on high-

dimensional embeddings, typically ranging from 512

to 1024 dimensions. For instance, BERT has two ver-

sions: BERT-base with 768 dimensions and BERT-

large with 1024 dimensions. Similarly, ELMO em-

beddings have 1024 dimensions. Two embedding

models from GPT, text-embedding-3-small, and text-

embedding-3-large, produce vectors with lengths of

1536 and 3072, respectively. While these high-

dimensional embeddings capture rich and detailed

linguistic information, they have challenges such as

increased computational complexity and memory re-

quirements. In the literature, dimensionality reduc-

tion techniques, such as PCA (Principal Component

Analysis) and t-SNE (t- Stochastic Neighbor Embed-

ding), are often used to address these issues while

preserving the performance in several tasks (Rau-

nak et al., 2019; Ayesha et al., 2020; George and

Sumathy, 2022;

Alvaro Huertas-Garc

ıa et al., 2022;

Zhang et al., 2024). For example, (Zhang et al., 2024)

study investigates the effects of reducing the dimen-

sionality of high-dimensional sentence embeddings.

The research assesses various unsupervised dimen-

sionality reduction techniques, such as PCA, SVD

( truncated Singular Value Decomposition), KPCA

(Kernel PCA), GRP (Gaussian Random Projections

), and autoencoders, to compress these embeddings.

The aim is to cut down on storage and computational

expenses while preserving performance in different

downstream NLP tasks. Their ﬁndings indicate that

PCA is the most efﬁcient method, achieving a 50%

reduction in dimensionality with only a 1% perfor-

mance loss. Notably, for some sentence encoders,

reducing dimensionality even enhanced accuracy. In

the research conducted by (Su et al., 2021), they uti-

lize a technique referred to as “whitening”, which is

based on PCA (Principal Component Analysis), to

process BERT sentence representations. This method

reduces the embedding size to 256 and 384, aiming

to address the issue of anisotropy and diminish di-

mensionality. Experimental results on seven bench-

mark datasets demonstrate that their method substan-

tially enhances performance and reduces vector size,

optimizing memory storage and accelerating retrieval

speed.

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

260

Figure 2: Framework for vector partitioning with sliding window technique.

3 METHOD: DIMENSIONALITY

REDUCTION

Although contextual embeddings effectively capture

both semantic and contextual knowledge, their high-

dimensional vectors can be both space-consuming

and computationally expensive, especially with large

datasets. Additionally, speciﬁc dimensions or seg-

ments of these vectors might capture information re-

lated to speciﬁc features of language or properties of

the text unit they represent. In this study, we proposed

an alternative approach that emphasizes identifying

patterns within vectors of any text unit, thereby re-

ducing the complexity of the analysis. This approach

is adaptable to any vectorization model.

We conducted an experimental study to ﬁnd sub-

vectors containing emotion information within BERT

vectors of sentences and words labeled with different

emotion categories (anger, fear, sadness, and joy) and

measured the performance of word and sentence rep-

resentations using only these sub-vectors. To perform

a comparative study and observe the method’s effec-

tiveness in different languages, we conducted the ex-

periments in both English and Turkish. Our proposed

methodology is summarized as follows:

1. A sliding window technique is employed to exam-

ine and extract meaningful patterns from BERT

vectors. This method divides the vectors into

smaller, ﬁxed-size parts (windows), enabling us

to obtain local contextual information.

2. Cosine similarity between words (both for En-

glish and Turkish) labeled with the same emotion

category is measured using only certain windows

of BERT vectors for word representations. Here,

an increase in cosine similarity values is expected

if there is emotion-speciﬁc information in certain

windows of the vectors.

To determine the window size for the sliding win-

dow technique we referred to the study of (Su et al.,

2021). They proposed another dimensionality reduc-

tion technique to decrease BERT vectors to lengths

of 256 and 384. Thus, in our study, the window size

is selected as 256. Initially, BERT word vectors, la-

beled by 4 different emotion categories and having

a length of 768, are divided into sub-vectors with a

window size of 256. The slide size is determined

to be 64 to cover every dimension of the BERT vec-

tors. For example, the ﬁrst sub-vector (window) starts

at dimension 1 and ends at dimension 256, and the

second one spans from dimension 65 to 321 as can

be seen detailly in Figure 2. To sum up, employing

the sliding window technique, we segmented the 768-

dimensional word BERT vectors into nine subvectors.

4 EXPERIMENTS

In this study, we utilized the NRC English emotion

lexicon (Mohammad and Turney, 2013) words and the

Turkish-translated NRC emotion lexicon (TT-NRC)

(Aka Uymaz and Kumova Metin, 2023). Both lex-

icons are annotated by Plutchick’s (Plutchik, 1980)

emotion categories. In the experimental study, we

considered the lexicon words labeled by four emotion

categories, namely anger, fear, sadness, and joy, for

both languages. The initial step was obtaining BERT

vectors of each lexicon word. Because BERT con-

structs vectors for words based on their surrounding

context, the words and the sentences constituting the

words should be given as parameters to BERT. We

Optimizing High-Dimensional Text Embeddings in Emotion Identiﬁcation: A Sliding Window Approach

261

Table 1: Pairwise in-category cosine similarity results of English words while using only one window.

Windows

1 2 3 4 5 6 7 8 9

Anger-Anger 0.249 0.597 0.628 0.633 0.630 0.361 0.256 0.244 0.233

Fear-Fear 0.220 0.607 0.634 0.640 0.637 0.340 0.236 0.226 0.215

Sadness-Sadness 0.236 0.598 0.629 0.636 0.633 0.357 0.254 0.250 0.242

In-category

cosine similarity

Joy-Joy 0.285 0.665 0.687 0.692 0.690 0.403 0.311 0.305 0.283

Table 2: Pairwise in-category cosine similarity results of Turkish words while using only one window.

Windows

1 2 3 4 5 6 7 8 9

Anger-Anger 0.288 0.330 0.300 0.324 0.312 0.767 0.766 0.768 0.775

Fear-Fear 0.276 0.318 0.292 0.321 0.306 0.760 0.760 0.761 0.768

Sadness-Sadness 0.275 0.317 0.295 0.321 0.302 0.760 0.760 0.762 0.770

In-category

cosine similarity

Joy-Joy 0.276 0.318 0.316 0.342 0.341 0.797 0.796 0.798 0.805

followed the same technique as (Aka Uymaz and Ku-

mova Metin, 2023) for deriving BERT vectors utiliz-

ing the collection of three sentence datasets labeled by

emotion four emotion categories (anger, fear, sadness,

joy): TEI (Mohammad and Bravo-Marquez, 2017),

TEC (Mohammad, 2012), and TREMO (Tocoglu and

Alpkocak, 2018). After applying our proposed slid-

ing window technique, we divided each BERT vector

of lexicon words into 9 sub-vectors. Then, utilizing

these sub-vectors individually to represent each word

vector, we measured the pairwise cosine similarity

score between each word belonging to emotion cate-

gories (in-category cosine similarity). Cosine similar-

ity takes values between 0 and 1. 0 indicates that two

vectors are completely different, while 1 means they

are identical. In this study, a high cosine similarity

score may indicate that certain sub-vectors are bet-

ter at capturing that emotion category. For instance,

when assessing cosine similarity between two words

labeled with joy, we utilized only the subvectors span-

ning dimensions 1 to 256 and computed the cosine

similarity. This procedure was repeated for other win-

dows, resulting in nine cosine similarity experiments

for each word represented by a single subvector. The

outcomes were shown as heat maps in Tables 1 and 2

for English and Turkish lexicon words, respectively.

The heat maps reveal that certain dimensions

within BERT vectors contain emotional clues. Con-

sequently, employing speciﬁc subsets of these vectors

in cosine similarity assessments yields higher similar-

ity compared to others. This implies that focusing on

subsets can be sufﬁcient instead of utilizing all 768-

dimensional vectors. Speciﬁcally, our examination of

English word vectors identiﬁed emotional data within

windows 2, 3, 4, and 5, while in Turkish, emotional

intensity may also found within windows 6, 7, 8, and

Following analyzing the in-category cosine simi-

larity among lexicon words represented by a window-

based vector, we applied these ﬁndings to a speciﬁc

process in emotion identiﬁcation: emotion enrich-

ment of text units. The experimental study on emo-

tion enrichment consists of two phases: sentence sub-

vector construction and emotion enrichment on sen-

tence vectors.

In this phase of the experimental study, we uti-

lize the TEI (Mohammad and Bravo-Marquez, 2017),

TEC (Mohammad, 2012), and TREMO (Tocoglu and

Alpkocak, 2018) datasets. Among these, TREMO is a

Turkish dataset, while the others are English datasets.

To enable experiments with both English and Turk-

ish, we translated the English datasets into Turkish

and the Turkish dataset into English. Subsequently,

we selected 500 sentences from each emotion cat-

egory (anger, fear, sadness, joy) randomly, to con-

struct the Emotion Sentence Dataset (ESD) used in

the sentence-based experiments. In order to construct

sentence sub-vectors, ﬁrstly, as an alternative to us-

ing the 768-dimensional BERT vectors for sentence

representations, we utilized the sub-parts identiﬁed

as having emotional information prior to word-based

experiments for both English and Turkish as can be

seen in detail in Figure 3. We combined the sub-

parts that yielded the best results in each language.

For instance, it was found that the English BERT vec-

tors had more emotive information in sub-vectors 2,

3, 4, and 5. These sub-parts were concatenated to

create a vector that spans from the start of the sec-

ond window’s dimensions to the end of the ﬁfth win-

dow’s dimensions. The process for combining these

sub-vectors is illustrated in Figure 4.

Later, we observed the success of BERT vec-

tors and sub-vectors from sentences in both lan-

guages in the emotion enrichment process (EEP). In

studies on emotion classiﬁcation or detection, emo-

tion/sentiment enrichment is a frequently researched

process in the literature (Agrawal et al., 2018; Wong-

patikaseree et al., 2021; Matsumoto et al., 2022). It

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

262

Figure 3: Flowchart for dimensionality reduction for word and sentence vectors.

Figure 4: Framework for extracting sub-vectors.

has been observed in studies that although semantic

and contextual embeddings demonstrate signiﬁcant

success in representing any text unit, they have some

shortcomings in expressing emotional information.

Therefore, it has been suggested that these vectors be

enhanced by adding emotional information. Studies

using cosine similarity-based or classiﬁcation-based

approaches with vectors containing emotional infor-

mation have shown higher success. Various methods

have been proposed in the literature. In this study,

we applied the emotion enrichment method proposed

by (Aka Uymaz and Kumova Metin, 2023) to our

English and Turkish sentence datasets. In summary,

this method works by comparing the vectors to be en-

riched with the vectors of emotion lexicon words. In

this comparison, the similarity (cosine similarity) of

each word to the emotional words in the lexicon is

calculated. The closest emotional words are identi-

ﬁed, and their vectors are used to enhance the orig-

inal word’s vector by weighting and averaging them

based on their emotional relevance. Finally, a hybrid

word representation is constructed by integrating se-

mantic/contextual and emotional embeddings.

In the experiments involving the emotion enrich-

ment process, we used Turkish and English sentences

as the text units to be enriched with emotional infor-

mation. Then, we calculated pairwise in-category co-

sine similarity scores within every emotion category

before and after enrichment. For the vector repre-

sentation of the sentences, we used 768-dimensional

BERT vectors and the BERT sub-vectors obtained in

the previous stage. The lexicons we used in the emo-

tion enrichment process were the NRC and TT-NRC

lexicons. We followed the same procedure for the

vector representation of the lexicon words as we did

for the sentences. That is, we ﬁrst represented the

lexicon words with BERT, then subjected the words

to the enrichment process as in (Aka Uymaz and Ku-

mova Metin, 2023), and ﬁnally obtained their sub-

vectors.

Tables 3 and 4 present the emotion enrichment

process of English and Turkish sentences by repre-

senting them with BERT and BERT sub-vectors. The

ﬁrst row in each table presents the average cosine sim-

ilarity results within emotion categories for sentences,

using BERT vectors of 768 lengths without additional

enrichment. We used these values as a baseline and

evaluated the outcomes of various enrichment com-

binations in comparison, showcasing the increments

as percentages in the tables. In the second row,

Optimizing High-Dimensional Text Embeddings in Emotion Identiﬁcation: A Sliding Window Approach

263

Table 3: English Sentence embeddings enrichment with several combinations. (The best results are shown in bold.).

Sentence

embedding

Enrichment

process

Enrichment by

In-category similarity (% improvement)

Anger Fear Joy Sadness Average

BERT - - 0,610 - 0,593 - 0,623 - 0,597 - 0,606 -

BERT ✓

Emotion Lexicon

Words (BERT + EEP)

0,844 38,36% 0,838 41,32% 0,879 41,09% 0,845 41,54% 0,852 40,57%

BERT Subvector ✓

Emotion Lexicon

Words Subvector

(BERT + EEP)

0,885 45,09% 0,880 48,44% 0,905 45,28% 0,883 47,88% 0,888 46,65%

Table 4: Turkish Sentence embeddings enrichment with several combinations. (The best results are shown in bold.).

Sentence

embedding

Enrichment

method

Enrichment by

In-category similarity (% improvement)

Anger Fear Joy Sadness Average

BERT - - 0,752 - 0,747 - 0,758 - 0,747 - 0,751 -

BERT ✓

Emotion Lexicon

Words (BERT + EEP)

0,922 22,61% 0,931 24,63% 0,943 24,41% 0,927 24,10% 0,931 23,93%

BERT Subvector ✓

Emotion Lexicon

Words Subvector

(BERT + EEP)

0,953 26,67% 0,959 28,45% 0,966 27,45% 0,956 28,03% 0,959 27,65%

768-dimensional BERT vectors were subjected to the

emotion enrichment process with 768-dimensional

lexicon word vectors, while in the third line, alter-

natively, both sentence and lexicon word sub-vectors

were used to represent and subjected to the emotion

enrichment process. As can be seen, in both lan-

guages, the in-category cosine similarity results of

emotionally enriched sentence vectors have yielded

the best outcome when subvectors of both sentence

and lexicon words’ vectors are utilized for all four

emotions. The best results in both languages have

been observed in the joy emotion category with scores

of 0,905 for English and 0,956 for Turkish. These

results provide promising insights into the effective-

ness of using sub-vectors instead of high-dimensional

vectors, both for the emotion enrichment process and

potentially reducing computational costs due to de-

creased vector size.

5 CONCLUSION

Natural language processing stands as a bridge be-

tween computer science, artiﬁcial intelligence, and

linguistics, which focuses on machines that can com-

prehend and generate human language better through

extensive analyses in various domains such as senti-

ment analysis, text summarization, and classiﬁcation.

One of the most important processes in NLP

studies is vectorization, which is simply the trans-

formation of textual data into numerical representa-

tions, for any computational analysis. Unlike tradi-

tional methods like TF-IDF, newer techniques such

as Word2Vec and BERT gained popularity because of

having semantic and contextual knowledge, respec-

tively, enriching the depth of linguistic representa-

tion. Especially, contextual models like BERT and its

derivatives not only capture word semantics but also

adapt to the nuanced contextual usage and polysemy,

thereby addressing the limitations of traditional and

semantic approaches.

However, the use of high-dimensional vectors

poses computational challenges, particularly in large

datasets. Feature selection and computational efﬁ-

ciency enhancements emerged as considerations as

optimization strategies. In this context, we identiﬁed

three research questions for our study:

RQ1. How can we enhance the effectiveness of

vector representations by optimizing computational

efﬁciency?

RQ2. Can we have insights into the nuanced inte-

gration of emotions within language representations

of text units?

RQ3. What are the differences or similarities be-

tween the application of an optimization approach on

vectors in the English and Turkish languages?

Firstly, related to RQ1, we proposed a sliding win-

dow technique to partition vectors into smaller, ﬁxed-

size parts, enabling the extraction of local contextual

information. This method was evaluated through pair-

wise cosine similarity metric among emotion lexicon

words which were annotated by four emotion cate-

gories, using both English and Turkish for addressing

RQ3.

Our experimental ﬁndings as an answer to RQ2

revealed that utilizing BERT vectors demonstrated

that certain dimensions are more informative regard-

ing emotional content. This suggests that using sub-

vectors may effectively capture emotional clues and

nuances in the languages, potentially reducing the

need to utilize entire high-dimensional vector repre-

sentations.

In the subsequent phase, we applied our ﬁndings

to sentence vectors, constructing sentence sub-vectors

based on the identiﬁed emotional dimensions (accord-

ing to determined windows) from the word-based ex-

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

264

periments. Then, to test our hypothesis on the effec-

tiveness of using speciﬁc vector segments, we con-

ducted experiments with these subvectors in compari-

son to using the original vectors in a case study related

to the emotion enrichment process on vectors. This

process simply incorporates vectors with additional

emotional information. The comparative analysis be-

tween English and Turkish highlighted the adaptabil-

ity of our method to different languages, acknowl-

edging the grammatical and structural differences of

Turkish.

When we examined the experimental results, we

found that using speciﬁc sub-vectors instead of the

original BERT vectors was both sufﬁcient and could

improve performance in cosine similarity calculations

within emotion categories at both the word and sen-

tence levels. As far as we know, this perspective and

method have not been previously studied in terms of

their applicability to any text unit represented by any

vectorization method. Additionally, this approach is

might be effective in capturing different types of in-

formation in vector representations and adapting to

different problems.

In future studies, similar experiments can be con-

ducted on other large language models (e.g., GPT

models (OpenAI, 2023), RoBERTa (Liu et al., 2019),

ELMO (Peters et al., 2018)) that have shown success-

ful results in the literature. This approach may en-

able the investigation of different sub-vectors contain-

ing emotional information in these models and to get

new perspectives. In our study, we carried out com-

parative analyses on English, a language rich in re-

sources, and Turkish, an agglutinative language with

fewer resources and a different grammatical structure.

This study can be expanded to include languages from

different language families and with various features.

Additionally, vectors can be reanalyzed for different

problems or information searches and the effective-

ness of the approach in various scenarios can be ex-

amined.

REFERENCES

Agrawal, A., An, A., and Papagelis, M. (2018). Learning

emotion-enriched word representations. In Proceed-

ings of the 27th International Conference on Compu-

tational Linguistics, pages 950–961, Santa Fe, New

Mexico, USA. Association for Computational Lin-

guistics.

Aka Uymaz, H. and Kumova Metin, S. (2023). Emotion-

enriched word embeddings for Turkish. Expert Sys-

tems with Applications, 225:120011.

Ayesha, S., Hanif, M. K., and Talib, R. (2020). Overview

and comparative study of dimensionality reduction

techniques for high dimensional data. Information Fu-

sion, 59:44–58.

Clark, K., Luong, M.-T., Le, Q. V., and Manning, C. D.

(2020). Electra: Pre-training text encoders as discrim-

inators rather than generators.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2018). Bert: Pre-training of deep bidirectional trans-

formers for language understanding.

George, L. and Sumathy, P. (2022). An integrated clustering

and bert framework for improved topic modeling.

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P.,

and Soricut, R. (2019). Albert: A lite bert for self-

supervised learning of language representations.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,

Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,

V. (2019). Roberta: A robustly optimized bert pre-

training approach.

Manning, C. D., Raghavan, P., and Sch

utze, H. (2008). In-

troduction to Information Retrieval. Cambridge Uni-

versity Press, Cambridge, UK.

Matsumoto, K., Matsunaga, T., Yoshida, M., and Kita,

K. (2022). Emotional similarity word embedding

model for sentiment analysis. Computaci

on y Sis-

temas, 26(2).

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).

Efﬁcient estimation of word representations in vector

space. Proceedings of Workshop at ICLR.

Mohammad, S. (2012). #emotional tweets. Proceedings of

the First Joint Conference on Lexical and Computa-

tional Semantics (*SEM).

Mohammad, S. and Bravo-Marquez, F. (2017). Emotion

intensities in tweets. In Proceedings of the 6th Joint

Conference on Lexical and Computational Semantics

(*SEM 2017), pages 65–77, Vancouver, Canada. As-

sociation for Computational Linguistics.

Mohammad, S. M. and Turney, P. D. (2013). Crowdsourc-

ing a word-emotion association lexicon. Computa-

tional Intelligence, 29(3):436–465.

OpenAI (2023). Gpt-large language model.

Pennington, J., Socher, R., and Manning, C. D. (2014).

Glove: Global vectors for word representation. In In

EMNLP.

Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark,

C., Lee, K., and Zettlemoyer, L. (2018). Deep con-

textualized word representations. In Proceedings of

the 2018 Conference of the North American Chapter

of the Association for Computational Linguistics: Hu-

man Language Technologies, Volume 1, New Orleans,

Louisiana. Association for Computational Linguistics.

Plutchik, R. (1980). A general psychoevolutionary theory

of emotion. In Plutchik, R. and Kellerman, H., editors,

Theories of Emotion, pages 3–33. Academic Press.

Raunak, V., Gupta, V., and Metze, F. (2019). Effective di-

mensionality reduction for word embeddings. In Au-

genstein, I., Gella, S., Ruder, S., Kann, K., Can, B.,

Welbl, J., Conneau, A., Ren, X., and Rei, M., editors,

Proceedings of the 4th Workshop on Representation

Learning for NLP (RepL4NLP-2019), pages 235–243,

Florence, Italy. Association for Computational Lin-

guistics.

Optimizing High-Dimensional Text Embeddings in Emotion Identiﬁcation: A Sliding Window Approach

265

Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019).

Distilbert, a distilled version of bert: smaller, faster,

cheaper and lighter. ArXiv, abs/1910.01108.

Su, J., Cao, J., Liu, W., and Ou, Y. (2021). Whitening sen-

tence representations for better semantics and faster

retrieval.

Tocoglu, M. and Alpkocak, A. (2018). Tremo: A dataset for

emotion analysis in Turkish. Journal of Information

Science, 44:016555151876101.

Wongpatikaseree, K., Kaewpitakkun, Y., Yuenyong, S.,

Matsuo, S., and Yomaboot, P. (2021). Emocnn: En-

coding emotional expression from text to word vec-

tor and classifying emotions—a case study in thai

social network conversation. Engineering Journal,

25(7):73–82.

Zhang, G., Zhou, Y., and Bollegala, D. (2024). Evalu-

ating unsupervised dimensionality reduction methods

for pretrained sentence embeddings.

Alvaro Huertas-Garc

ıa, Mart

ın, A., Huertas-Tato, J., and

Camacho, D. (2022). Exploring dimensionality reduc-

tion techniques in multilingual transformers.

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

266