An Empirical Study to Use Large Language Models to Extract Named Entities from Repetitive Texts

Angelica Lo Duca

doi:10.5220/0013066500003825

An Empirical Study to Use Large Language Models to Extract Named Entities from Repetitive Texts

Angelica Lo Duca

2024

Abstract

Large language models (LLMs) are a very recent technology that assists researchers, developers, and people in general to complete their tasks quickly. The main difficulty in using this technology is defining effective instructions for the models, understanding the models’ behavior, and evaluating the correctness of the produced results. This paper describes a possible approach based on LLMs to extract named entities from repetitive texts, such as population registries. The paper focuses on two LLMs (GPT 3.5 Turbo and GPT 4), and runs some empirical experiments based on different levels of detail contained in the instructions. Results show that the best performance is achieved with GPT 4, with a high level of detail in the instructions and the highest costs. The trade-off between costs and performance is given when using GPT 3.5 Turbo when the level of detail is medium.

Download

Paper Citation

in Harvard Style

Lo Duca A. (2024). An Empirical Study to Use Large Language Models to Extract Named Entities from Repetitive Texts. In Proceedings of the 20th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST; ISBN 978-989-758-718-4, SciTePress, pages 417-424. DOI: 10.5220/0013066500003825

in Bibtex Style

@conference{webist24,
author={Angelica Lo Duca},
title={An Empirical Study to Use Large Language Models to Extract Named Entities from Repetitive Texts},
booktitle={Proceedings of the 20th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST},
year={2024},
pages={417-424},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013066500003825},
isbn={978-989-758-718-4},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST
TI - An Empirical Study to Use Large Language Models to Extract Named Entities from Repetitive Texts
SN - 978-989-758-718-4
AU - Lo Duca A.
PY - 2024
SP - 417
EP - 424
DO - 10.5220/0013066500003825
PB - SciTePress