Legal Information Retrieval Based on a Concept-Frequency Representation and Thesaurus

Wagner Costa, Glauco Pedrosa

2023

Abstract

The retrieval of legal information has become one of the main topics in the legal domain, which is characterized by a huge amount of digital documents with a peculiar language. This paper presents a novel approach, called BoLC-Th (Bag of Legal Concepts Based on Thesaurus), to represent legal texts based on the Bag-of-Concept (BoC) approach. The novel contribution of the BoLC-Th is to generate weighted histograms of concepts defined from the distance of the word to its respective similar term within a thesaurus. This approach allows to emphasize those words that have more significance for the context, thus generating more discriminative vectors. We performed experimental evaluations by comparing the proposed approach with the traditional Bag-of-Words (BoW), TF-IDF and BoC approaches, which are popular techniques for document representation. The proposed method obtained the best result among the evaluated techniques for retrieving judgments and jurisprudence documents. The BoLC-Th increased the mAP (mean Average Precision) compared to the traditional BoC approach, while being faster than the traditional BoW and TF-IDF representations. The proposed approach contributes to enrich a domain area with peculiar characteristics, providing a resource for retrieving textual information more accurately and quickly than other techniques based on natural language processing.

Download


Paper Citation


in Harvard Style

Costa W. and Pedrosa G. (2023). Legal Information Retrieval Based on a Concept-Frequency Representation and Thesaurus. In Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-648-4, SciTePress, pages 303-311. DOI: 10.5220/0011728400003467


in Bibtex Style

@conference{iceis23,
author={Wagner Costa and Glauco Pedrosa},
title={Legal Information Retrieval Based on a Concept-Frequency Representation and Thesaurus},
booktitle={Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2023},
pages={303-311},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011728400003467},
isbn={978-989-758-648-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Legal Information Retrieval Based on a Concept-Frequency Representation and Thesaurus
SN - 978-989-758-648-4
AU - Costa W.
AU - Pedrosa G.
PY - 2023
SP - 303
EP - 311
DO - 10.5220/0011728400003467
PB - SciTePress