periments. Then, to test our hypothesis on the effec-
tiveness of using specific vector segments, we con-
ducted experiments with these subvectors in compari-
son to using the original vectors in a case study related
to the emotion enrichment process on vectors. This
process simply incorporates vectors with additional
emotional information. The comparative analysis be-
tween English and Turkish highlighted the adaptabil-
ity of our method to different languages, acknowl-
edging the grammatical and structural differences of
Turkish.
When we examined the experimental results, we
found that using specific sub-vectors instead of the
original BERT vectors was both sufficient and could
improve performance in cosine similarity calculations
within emotion categories at both the word and sen-
tence levels. As far as we know, this perspective and
method have not been previously studied in terms of
their applicability to any text unit represented by any
vectorization method. Additionally, this approach is
might be effective in capturing different types of in-
formation in vector representations and adapting to
different problems.
In future studies, similar experiments can be con-
ducted on other large language models (e.g., GPT
models (OpenAI, 2023), RoBERTa (Liu et al., 2019),
ELMO (Peters et al., 2018)) that have shown success-
ful results in the literature. This approach may en-
able the investigation of different sub-vectors contain-
ing emotional information in these models and to get
new perspectives. In our study, we carried out com-
parative analyses on English, a language rich in re-
sources, and Turkish, an agglutinative language with
fewer resources and a different grammatical structure.
This study can be expanded to include languages from
different language families and with various features.
Additionally, vectors can be reanalyzed for different
problems or information searches and the effective-
ness of the approach in various scenarios can be ex-
amined.
REFERENCES
Agrawal, A., An, A., and Papagelis, M. (2018). Learning
emotion-enriched word representations. In Proceed-
ings of the 27th International Conference on Compu-
tational Linguistics, pages 950–961, Santa Fe, New
Mexico, USA. Association for Computational Lin-
guistics.
Aka Uymaz, H. and Kumova Metin, S. (2023). Emotion-
enriched word embeddings for Turkish. Expert Sys-
tems with Applications, 225:120011.
Ayesha, S., Hanif, M. K., and Talib, R. (2020). Overview
and comparative study of dimensionality reduction
techniques for high dimensional data. Information Fu-
sion, 59:44–58.
Clark, K., Luong, M.-T., Le, Q. V., and Manning, C. D.
(2020). Electra: Pre-training text encoders as discrim-
inators rather than generators.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2018). Bert: Pre-training of deep bidirectional trans-
formers for language understanding.
George, L. and Sumathy, P. (2022). An integrated clustering
and bert framework for improved topic modeling.
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P.,
and Soricut, R. (2019). Albert: A lite bert for self-
supervised learning of language representations.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,
V. (2019). Roberta: A robustly optimized bert pre-
training approach.
Manning, C. D., Raghavan, P., and Sch
¨
utze, H. (2008). In-
troduction to Information Retrieval. Cambridge Uni-
versity Press, Cambridge, UK.
Matsumoto, K., Matsunaga, T., Yoshida, M., and Kita,
K. (2022). Emotional similarity word embedding
model for sentiment analysis. Computaci
´
on y Sis-
temas, 26(2).
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space. Proceedings of Workshop at ICLR.
Mohammad, S. (2012). #emotional tweets. Proceedings of
the First Joint Conference on Lexical and Computa-
tional Semantics (*SEM).
Mohammad, S. and Bravo-Marquez, F. (2017). Emotion
intensities in tweets. In Proceedings of the 6th Joint
Conference on Lexical and Computational Semantics
(*SEM 2017), pages 65–77, Vancouver, Canada. As-
sociation for Computational Linguistics.
Mohammad, S. M. and Turney, P. D. (2013). Crowdsourc-
ing a word-emotion association lexicon. Computa-
tional Intelligence, 29(3):436–465.
OpenAI (2023). Gpt-large language model.
Pennington, J., Socher, R., and Manning, C. D. (2014).
Glove: Global vectors for word representation. In In
EMNLP.
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark,
C., Lee, K., and Zettlemoyer, L. (2018). Deep con-
textualized word representations. In Proceedings of
the 2018 Conference of the North American Chapter
of the Association for Computational Linguistics: Hu-
man Language Technologies, Volume 1, New Orleans,
Louisiana. Association for Computational Linguistics.
Plutchik, R. (1980). A general psychoevolutionary theory
of emotion. In Plutchik, R. and Kellerman, H., editors,
Theories of Emotion, pages 3–33. Academic Press.
Raunak, V., Gupta, V., and Metze, F. (2019). Effective di-
mensionality reduction for word embeddings. In Au-
genstein, I., Gella, S., Ruder, S., Kann, K., Can, B.,
Welbl, J., Conneau, A., Ren, X., and Rei, M., editors,
Proceedings of the 4th Workshop on Representation
Learning for NLP (RepL4NLP-2019), pages 235–243,
Florence, Italy. Association for Computational Lin-
guistics.
Optimizing High-Dimensional Text Embeddings in Emotion Identification: A Sliding Window Approach
265