Text Classification in the Brazilian Legal Domain

Gustavo Coelho, Alimed Celecia, Jefferson de Sousa, Melissa Cavaliere, Maria Lima, Ana Mangeth, Isabella Frajhof, Cesar Cury, Marco Casanova

2022

Abstract

Text classification is a popular Natural Language Processing task that aims at predicting the categorical values associated with textual instances. One of the relevant application fields for this task is the legal domain, which involves a high volume of unstructured textual documents. This paper proposes a new model for the task of classifying legal opinions related to consumer complaints according to the moral damage value. The proposed model, named MuDEC (Multi-step Document Embedding-Based Classifier), combines Doc2vec and SVM for feature extraction and classification, respectively. To optimize the classification performance, the model uses a combination of methods, such as oversampling for imbalanced datasets, clustering for the identification of textual patterns, and dimensionality reduction for complexity control. For performance evaluation, a 6-class dataset of 193 legal opinions related to consumer complaints was created in which each instance was manually labeled according to its moral damage value. A 10-fold stratified cross-validation resampling procedure was used to evaluate different models. The results demonstrated that, under this experimental setup, MuDEC outperforms baseline models by a significant margin, achieving 78.7% of accuracy, compared to 61.1% for a SIF classifier and 65.2% for a C-LSTM classifier.

Download


Paper Citation


in Harvard Style

Coelho G., Celecia A., de Sousa J., Cavaliere M., Lima M., Mangeth A., Frajhof I., Cury C. and Casanova M. (2022). Text Classification in the Brazilian Legal Domain. In Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-569-2, pages 355-363. DOI: 10.5220/0011062000003179


in Bibtex Style

@conference{iceis22,
author={Gustavo Coelho and Alimed Celecia and Jefferson de Sousa and Melissa Cavaliere and Maria Lima and Ana Mangeth and Isabella Frajhof and Cesar Cury and Marco Casanova},
title={Text Classification in the Brazilian Legal Domain},
booktitle={Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2022},
pages={355-363},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011062000003179},
isbn={978-989-758-569-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - Text Classification in the Brazilian Legal Domain
SN - 978-989-758-569-2
AU - Coelho G.
AU - Celecia A.
AU - de Sousa J.
AU - Cavaliere M.
AU - Lima M.
AU - Mangeth A.
AU - Frajhof I.
AU - Cury C.
AU - Casanova M.
PY - 2022
SP - 355
EP - 363
DO - 10.5220/0011062000003179