On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts

Gautam Shahi; Oliver Hummel

doi:10.5220/0013299100003929

On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts

Gautam Shahi, Oliver Hummel

2025

Abstract

The rapid advancement of Large Language Models (LLMs) has led to a multitude of application opportunities. One traditional task for Information Retrieval systems is the summarization and classification of texts, both of which are important for supporting humans in navigating large literature bodies as they e.g. exist with scientific publications. Due to this rapidly growing body of scientific knowledge, recent research has been aiming at building research information systems that not only offer traditional keyword search capabilities, but also novel features such as the automatic detection of research areas that are present at knowledge-intensive organizations in academia and industry. To facilitate this idea, we present the results obtained from evaluating a variety of LLMs in their ability to sort scientific publications into hierarchical classifications systems. Using the FORC dataset as ground truth data, we have found that recent LLMs (such as Meta’s Llama 3.1) are able to reach an accuracy of up to 0.82, which is up to 0.08 better than traditional BERT models.

Download

Paper Citation

in Harvard Style

Shahi G. and Hummel O. (2025). On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts. In Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-749-8, SciTePress, pages 544-554. DOI: 10.5220/0013299100003929

in Bibtex Style

@conference{iceis25,
author={Gautam Shahi and Oliver Hummel},
title={On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts},
booktitle={Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2025},
pages={544-554},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013299100003929},
isbn={978-989-758-749-8},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts
SN - 978-989-758-749-8
AU - Shahi G.
AU - Hummel O.
PY - 2025
SP - 544
EP - 554
DO - 10.5220/0013299100003929
PB - SciTePress