Authors:
Roberto Gatta
1
;
Mauro Vallati
2
;
Berardino De Bari
1
;
Nadia Pasinetti
1
;
Carlo Cappelli
1
;
Ilenia Pirola
1
;
Massimo Salvetti
1
;
Michela Buglione
1
;
Maria L. Muiesan
1
;
Stefano M. Magrini
1
and
Maurizio Castellano
1
Affiliations:
1
University of Brescia, Italy
;
2
University of Huddersfield, United Kingdom
Keyword(s):
Information Retrieval, Text Categorization, Document Classification.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Business Analytics
;
Data Engineering
;
Data Mining
;
Databases and Datawarehousing
;
Databases and Information Systems Integration
;
Datamining
;
Enterprise Information Systems
;
Health Information Systems
;
Pattern Recognition and Machine Learning
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
The clinical documents stored in a textual and unstructured manner represent a precious source of information that can be gathered by exploiting Information Retrieval techniques. Classification algorithms, and their composition through Ensemble Methods, can be used for organizing this huge amount of data, but are usually tested on standardized corpora, which significantly differ from actual clinical documents that can be found in a modern hospital.
In this paper we present the results of a large experimental analysis conducted on 36,000 clinical documents, generated by three different medical Departments. For the sake of this investigation we propose a new classifier, based on the entropy idea, and test four single algorithms and four ensemble methods. The experimental results show the performance of selected approaches in a real-world environment, and highlights the impact of obsolescence on classification.