A Data-driven Approach to Predict Hospital Length of Stay - A Portuguese Case Study

Nuno Caetano, Raul M. S. Laureano, Paulo Cortez

2014

Abstract

Data Mining (DM) aims at the extraction of useful knowledge from raw data. In the last decades, hospitals have collected large amounts of data through new methods of electronic data storage, thus increasing the potential value of DM in this domain area, in what is known as medical data mining. This work focuses on the case study of a Portuguese hospital, based on recent and large dataset that was collected from 2000 to 2013. A data-driven predictive model was obtained for the length of stay (LOS), using as inputs indicators commonly available at the hospitalization process. Based on a regression approach, several state-of-the-art DM models were compared. The best result was obtained by a Random Forest (RF), which presents a high quality coefficient of determination value (0.81). Moreover, a sensitivity analysis approach was used to extract human understandable knowledge from the RF model, revealing top three influential input attributes: hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such predictive and explanatory knowledge is valuable for supporting decisions of hospital managers.

References

  1. Abelha, F., Maia, P., Landeiro, N., Neves, A., and Barros, H. (2007). Determinants of outcome in patients admitted to a surgical intensive care unit. Arquivos de Medicina, 21(5-6):135-43.
  2. Azari, A., Janeja, V. P., and Mohseni, A. (2012). Predicting hospital length of stay (phlos): A multi-tiered data mining approach. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on, pages 17-24. IEEE.
  3. Bi, J. and Bennett, K. (2003). Regression Error Characteristic curves. In Fawcett, T. and Mishra, N., editors, Proceedings of 20th Int. Conf. on Machine Learning (ICML), Washington DC, USA, AAAI Press.
  4. Brown, M. and Kros, J. (2003). Data mining and the impact of missing data. Industrial Management & Data Systems, 103(8):611-621.
  5. Cios, K. and Moore, G. (2002). Uniqueness of Medical Data Mining. Artificial Intelligence in Medicine, 26(1- 2):1-24.
  6. Clifton, C. and Thuraisingham, B. (2001). Emerging standards for data mining. Computer Standards & Interfaces, 23(3):187-193.
  7. Cortez, P. (2010). Data Mining with Neural Networks and Support Vector Machines using the R/rminer Tool. In Perner, P., editor, Advances in Data Mining - Applications and Theoretical Aspects, 10th Industrial Conference on Data Mining, pages 572-583, Berlin, Germany. LNAI 6171, Springer.
  8. Cortez, P. and Embrechts, M. J. (2013). Using sensitivity analysis and visualization techniques to open black box data mining models. Information Sciences, 225:1-17.
  9. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). Advances in Knowledge Discovery and Data Mining. MIT Press.
  10. Freitas, A., Silva-Costa, T., Lopes, F., Garcia-Lema, I., Teixeira-Pinto, A., Brazdil, P., and Costa-Pereira, A. (2012). Factors influencing hospital high length of stay outliers. BMC Health Services Research, 12(1):265.
  11. Freund, Y. and Schapire, R. E. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory, pages 23-37. Springer.
  12. Guzman Castillo, M. (2012). Modelling patient length of stay in public hospitals in Mexico. PhD thesis, University of Southampton.
  13. Hastie, T., Tibshirani, R., and Friedman, J. (2008). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, NY, USA, 2nd edition.
  14. Kalra, A. D., Fisher, R. S., and Axelrod, P. (2010). Decreased length of stay and cumulative hospitalized days despite increased patient admissions and readmissions in an area of urban poverty. Journal of general internal medicine, 25(9):930-935.
  15. Menard, S. (2002). Applied logistic regression analysis. Number 106. Sage.
  16. Oliveira, A., Dias, O., Mello, M., Arajo, S., Dragosavac, D., Nucci, A., and Falca˜o, A. (2010). Fatores associados à maior mortalidade e tempo de internac¸a˜o prolongado em uma unidade de terapia intensiva de adultos. Revista Brasileira de Terapia Intensiva, 22(3):250-256.
  17. Pena, F., Soares, J., Peixoto, R., Jnior, H., Paiva, B., Moraes, F., Engel, P., Gomes, N., and Pena, G. (2010). Análise de um modelo de risco pré-operatrio especfico para cirurgia valvar e a relac¸a˜o com o tempo de internac¸a˜o em unidade de terapia intensiva. Revista Brasileira de Terapia Intensiva, 22(4):339-345.
  18. Sheikh-Nia, S. (2012). An Investigation of Standard and Ensemble Based Classification Techniques for the Prediction of Hospitalization Duration. Thesis for Master Science Degree, University of Guelph, Ontario, Canada.
  19. Silva, A., Cortez, P., Santos, M. F., Gomes, L., and Neves, J. (2006). Mortality assessment in intensive care units via adverse events using artificial neural networks. Artificial Intelligence in Medicine, 36(3):223-234.
  20. Silva, A., Cortez, P., Santos, M. F., Gomes, L., and Neves, J. (2008). Rating organ failure via adverse events using data mining in the intensive care unit. Artificial Intelligence in Medicine, 43(3):179-193.
  21. Witten, I., Frank, E., and Hall, M. (2011). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Franscico, USA, San Francisco, CA, 3rd edition.
Download


Paper Citation


in Harvard Style

Caetano N., Laureano R. and Cortez P. (2014). A Data-driven Approach to Predict Hospital Length of Stay - A Portuguese Case Study . In Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-027-7, pages 407-414. DOI: 10.5220/0004892204070414


in Bibtex Style

@conference{iceis14,
author={Nuno Caetano and Raul M. S. Laureano and Paulo Cortez},
title={A Data-driven Approach to Predict Hospital Length of Stay - A Portuguese Case Study},
booktitle={Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2014},
pages={407-414},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004892204070414},
isbn={978-989-758-027-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - A Data-driven Approach to Predict Hospital Length of Stay - A Portuguese Case Study
SN - 978-989-758-027-7
AU - Caetano N.
AU - Laureano R.
AU - Cortez P.
PY - 2014
SP - 407
EP - 414
DO - 10.5220/0004892204070414