Authors:
Gustavo Coelho
1
;
Alimed Celecia
1
;
Jefferson de Sousa
1
;
Melissa Lemos
1
;
Maria Lima
1
;
Ana Mangeth
2
;
Isabella Frajhof
2
and
Marco Casanova
1
Affiliations:
1
Tecgraf - PUC-Rio, Rio de Janeiro, Brazil
;
2
LES - PUC-Rio, Rio de Janeiro, Brazil
Keyword(s):
Natural Language Processing, Information Extraction, Text Classification, Named Entity Recognition, Large Language Models, Prompt Engineering.
Abstract:
Information Extraction is an important task in the legal domain. While the presence of structured and machine-processable data is scarce, unstructured data in the form of legal documents, such as legal opinions, is largely available. If properly processed, such documents can provide valuable information about past lawsuits, allowing better assessment by legal professionals and supporting data-driven applications. This paper addresses information extraction in the Brazilian legal domain by extracting structured features from legal opinions related to consumer complaints. To address this task, the paper explores two different approaches. The first is based on traditional supervised learning methods to extract information from legal opinions by essentially treating the extraction of categorical features as text classification and the extraction of numerical features as named entity recognition. The second approach takes advantage of the recent popularization of Large Language Models (LL
Ms) to extract categorical and numerical features using ChatGPT and prompt engineering techniques. The paper demonstrates that while both approaches reach similar overall performances in terms of traditional evaluation metrics, ChatGPT substantially reduces the complexity and time required along the process.
(More)