Authors:
Wagner Costa
and
Glauco Pedrosa
Affiliation:
Graduate Program in Applied Computing (PPCA), University of Brasilia (UnB), Brasilia, Brazil
Keyword(s):
Information Retrieval, Data Mining, Natural Language Processing, Deep Learning, Decision Support System.
Abstract:
The retrieval of legal information has become one of the main topics in the legal domain, which is characterized by a huge amount of digital documents with a peculiar language. This paper presents a novel approach, called BoLC-Th (Bag of Legal Concepts Based on Thesaurus), to represent legal texts based on the Bag-of-Concept (BoC) approach. The novel contribution of the BoLC-Th is to generate weighted histograms of concepts defined from the distance of the word to its respective similar term within a thesaurus. This approach allows to emphasize those words that have more significance for the context, thus generating more discriminative vectors. We performed experimental evaluations by comparing the proposed approach with the traditional Bag-of-Words (BoW), TF-IDF and BoC approaches, which are popular techniques for document representation. The proposed method obtained the best result among the evaluated techniques for retrieving judgments and jurisprudence documents. The BoLC-Th incr
eased the mAP (mean Average Precision) compared to the traditional BoC approach, while being faster than the traditional BoW and TF-IDF representations. The proposed approach contributes to enrich a domain area with peculiar characteristics, providing a resource for retrieving textual information more accurately and quickly than other techniques based on natural language processing.
(More)