Authors:
Nuno Caetano
1
;
Raul M. S. Laureano
1
and
Paulo Cortez
2
Affiliations:
1
Instituto Universitário de Lisboa (ISCTE-IUL), Portugal
;
2
University of Minho, Portugal
Keyword(s):
Medical Data Mining, Length of Stay, CRISP-DM, Regression, Random Forest
Related
Ontology
Subjects/Areas/Topics:
Applications of Expert Systems
;
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Data Mining (DM) aims at the extraction of useful knowledge from raw data. In the last decades, hospitals have collected large amounts of data through new methods of electronic data storage, thus increasing the potential value of DM in this domain area, in what is known as medical data mining. This work focuses on the case study of a Portuguese hospital, based on recent and large dataset that was collected from 2000 to 2013. A data-driven predictive model was obtained for the length of stay (LOS), using as inputs indicators commonly available at the hospitalization process. Based on a regression approach, several state-of-the-art DM models were compared. The best result was obtained by a Random Forest (RF), which presents a high quality coefficient of determination value (0.81). Moreover, a sensitivity analysis approach was used to extract human understandable knowledge from the RF model, revealing top three influential input attributes: hospital episode type, the physical service wher
e the patient is hospitalized and the associated medical specialty. Such predictive and explanatory knowledge is valuable for supporting decisions of hospital managers.
(More)