Information Extraction in the Legal Domain: Traditional Supervised Learning vs. ChatGPT

Gustavo Coelho, Alimed Celecia, Jefferson de Sousa, Melissa Lemos, Maria Lima, Ana Mangeth, Isabella Frajhof, Marco Casanova

2024

Abstract

Information Extraction is an important task in the legal domain. While the presence of structured and machine-processable data is scarce, unstructured data in the form of legal documents, such as legal opinions, is largely available. If properly processed, such documents can provide valuable information about past lawsuits, allowing better assessment by legal professionals and supporting data-driven applications. This paper addresses information extraction in the Brazilian legal domain by extracting structured features from legal opinions related to consumer complaints. To address this task, the paper explores two different approaches. The first is based on traditional supervised learning methods to extract information from legal opinions by essentially treating the extraction of categorical features as text classification and the extraction of numerical features as named entity recognition. The second approach takes advantage of the recent popularization of Large Language Models (LLMs) to extract categorical and numerical features using ChatGPT and prompt engineering techniques. The paper demonstrates that while both approaches reach similar overall performances in terms of traditional evaluation metrics, ChatGPT substantially reduces the complexity and time required along the process.

Download


Paper Citation


in Harvard Style

Coelho G., Celecia A., de Sousa J., Lemos M., Lima M., Mangeth A., Frajhof I. and Casanova M. (2024). Information Extraction in the Legal Domain: Traditional Supervised Learning vs. ChatGPT. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-692-7, SciTePress, pages 579-586. DOI: 10.5220/0012499800003690


in Bibtex Style

@conference{iceis24,
author={Gustavo Coelho and Alimed Celecia and Jefferson de Sousa and Melissa Lemos and Maria Lima and Ana Mangeth and Isabella Frajhof and Marco Casanova},
title={Information Extraction in the Legal Domain: Traditional Supervised Learning vs. ChatGPT},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2024},
pages={579-586},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012499800003690},
isbn={978-989-758-692-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Information Extraction in the Legal Domain: Traditional Supervised Learning vs. ChatGPT
SN - 978-989-758-692-7
AU - Coelho G.
AU - Celecia A.
AU - de Sousa J.
AU - Lemos M.
AU - Lima M.
AU - Mangeth A.
AU - Frajhof I.
AU - Casanova M.
PY - 2024
SP - 579
EP - 586
DO - 10.5220/0012499800003690
PB - SciTePress