Authors:
Víctor Labrador
1
;
Álvaro Peiró
1
;
Ángel Luís Garrido
2
and
Eduardo Mena
2
Affiliations:
1
InSynergy Consulting S.A., Madrid, Spain
;
2
SID Research Group, IIS Department, University of Zaragoza, Zaragoza, Spain
Keyword(s):
Machine Learning, Word Embeddings, Automatic Classification, Legal Documents, Performance.
Abstract:
Nowadays, the number of legal documents processed daily prevents the work from being done manually. One of the most relevant processes is the classification of this kind of documents, not only because of the importance of the task itself, but also since it is the starting point for other important tasks such as data search or information extraction. In spite of technological advances, the task of automatic classification is still performed by specialized staff, which is expensive, time-consuming, and subject to human errors. In the best case it is possible to find systems with statistical approaches whose benefits in terms of efficacy and efficiency are limited. Moreover, the presence of overlapping elements in legal documents, such as stamps or signatures distort the text and hinder these automatic tasks. In this work, we present an approach for performing automatic classification tasks over these legal documents which exploits the semantic properties of word embeddings. We have imp
lemented our approach so that it is simple to address different types of documents with little effort. Experimental results with real data show promising results, greatly increasing the productivity of systems based on other approaches.
(More)