Authors:
Robert A. N. de Oliveira
and
Methanias Colaço Júnior
Affiliation:
UFS - Universidade Federal de Sergipe, Brazil
Keyword(s):
Dimensionality Reduction, Experimental Analysis, Jurisprudence, Stemming.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Performance Evaluation and Benchmarking
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Stemming algorithms are commonly used during textual preprocessing phase in order to reduce data dimensionality. However, this reduction presents different efficacy levels depending on the domain it is applied. Hence, this work is an experimental analysis about the dimensionality reduction by stemming a veracious base of judicial jurisprudence formed by four subsets of documents. With such document base, it is necessary to adopt techniques that increase the efficiency of storage and search for such information, otherwise there is a loss of both computing resources and access to justice, as stakeholders may not find the document they need to plead their rights. The results show that, depending on the algorithm and the collection, there may be a reduction of up to 52\% of these terms in the documents. Furthermore, we have found a strong correlation between the reduction percentage and the quantity of unique terms in the original document. This way, RSLP algorithm was the most effective
in terms of dimensionality reduction, among the stemming algorithms analyzed, in the four collections studied and it was excelled when applied to judgments of Appeals Court.
(More)