loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Gustavo M. C. Coelho 1 ; Alimed Celecia 1 ; Jefferson de Sousa 1 ; Melissa Cavaliere 1 ; Maria Julia Lima 1 ; Ana Mangeth 2 ; Isabella Frajhof 2 ; Cesar Cury 3 and Marco Casanova 1

Affiliations: 1 Tecgraf, PUC-Rio, Rio de Janeiro, Brazil ; 2 LES, PUC-Rio, Rio de Janeiro, Brazil ; 3 Escola da Magistratura do Estado do Rio de Janeiro, Rio de Janeiro, Brazil

Keyword(s): Document Embedding, Text Classification, Natural Language Processing.

Abstract: Text classification is a popular Natural Language Processing task that aims at predicting the categorical values associated with textual instances. One of the relevant application fields for this task is the legal domain, which involves a high volume of unstructured textual documents. This paper proposes a new model for the task of classifying legal opinions related to consumer complaints according to the moral damage value. The proposed model, named MuDEC (Multi-step Document Embedding-Based Classifier), combines Doc2vec and SVM for feature extraction and classification, respectively. To optimize the classification performance, the model uses a combination of methods, such as oversampling for imbalanced datasets, clustering for the identification of textual patterns, and dimensionality reduction for complexity control. For performance evaluation, a 6-class dataset of 193 legal opinions related to consumer complaints was created in which each instance was manually labeled according t o its moral damage value. A 10-fold stratified cross-validation resampling procedure was used to evaluate different models. The results demonstrated that, under this experimental setup, MuDEC outperforms baseline models by a significant margin, achieving 78.7% of accuracy, compared to 61.1% for a SIF classifier and 65.2% for a C-LSTM classifier. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 13.58.61.197

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Coelho, G.; Celecia, A.; de Sousa, J.; Cavaliere, M.; Lima, M.; Mangeth, A.; Frajhof, I.; Cury, C. and Casanova, M. (2022). Text Classification in the Brazilian Legal Domain. In Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-569-2; ISSN 2184-4992, SciTePress, pages 355-363. DOI: 10.5220/0011062000003179

@conference{iceis22,
author={Gustavo M. C. Coelho. and Alimed Celecia. and Jefferson {de Sousa}. and Melissa Cavaliere. and Maria Julia Lima. and Ana Mangeth. and Isabella Frajhof. and Cesar Cury. and Marco Casanova.},
title={Text Classification in the Brazilian Legal Domain},
booktitle={Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2022},
pages={355-363},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011062000003179},
isbn={978-989-758-569-2},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Text Classification in the Brazilian Legal Domain
SN - 978-989-758-569-2
IS - 2184-4992
AU - Coelho, G.
AU - Celecia, A.
AU - de Sousa, J.
AU - Cavaliere, M.
AU - Lima, M.
AU - Mangeth, A.
AU - Frajhof, I.
AU - Cury, C.
AU - Casanova, M.
PY - 2022
SP - 355
EP - 363
DO - 10.5220/0011062000003179
PB - SciTePress