Data Mining for the Unique Identification of Patients in the National Healthcare Systems

D. G. Ramírez-Ríos, Laura P. Manotas Romero, Heyder Paez-Logreira, Luis Ramírez, Yohany Andrés Jimenez Florez


This paper considers the application of data mining (DM) algorithms as a feasible and necessary strategy for optimal management of databases (DB) in the national healthcare systems. Specifically it deals with the management of multiple DB that consider patient’s affiliation information, under the supervision of the authorities in healthcare, an issue that involves not only the issues of every citizen but also its integral right to be treated by any institution. We support the idea that the administrative part of the healthcare system should not obstruct the attention of the patient and a total efficiency must be guaranteed. We believe that DM algorithms are appropriate for this task and human intervention should be minimized. A case study was developed in Colombia that considered the multiple affiliations to DB and its integration to a unique DB managed by the District Health Secretary (DHS, which detected frauds and other type of duplicities. The mechanism used to approach this, indicates not only a significant reduction of manual intervention of the DB, but also allows the extraction of data for future analysis, supporting the patient’s need for an efficient and integral health attention, as well as privacy of personal information registered.


  1. Batra, S., Parashar , H., Sachdeva, S., and Mehndiratta, P. (2013). Applying data mining techniques to standardized electronic health records for decision support. ieeexplore, 510-515.
  2. Bellazzi, R., and Zupan, B. (2008). Predictive data mining in clinical medicine: current issues and guidelines. International journal of medical informatics, 77(2), 81-97.
  3. DANE. (2013). Departamento Administrativo Nacional de Estadística. Obtenido de index.php/estadisticas-sociales/encuesta-longitudinalde-proteccion-social.
  4. Date, C., and Date, C. (1990). An introduction to database systems. Addison-wesley Reading, MA.
  5. Davila Hernandez , F., and Sanchez Corrales, Y. (2012). Técnicas de minería de datos aplicadas al diagnóstico de entidades clínicas. Revista Cubana de Informática Médica, 174-183.
  6. DeBariloche. (09 de 15 de 2014). Portal Rio negro. Obtenido de Buscan mejorar la Tish con la base de datos de AFIP: buscan-mejorar-la-tish-con-la-base-de-datos-de-afip4365198-9862-nota_cordillera.aspx.
  7. Elmagarmid, A. K., Ipeirotis, P. G., and Verykios, V. S. (2007). Duplicate record detection: A survey. Knowledge and Data Engineering, 19(1), 1-16.
  8. Esp, I., and Ramírez, C. (2009). Hacia una metodología para la selección de técnicas de depuración de datos. Rev. Av. En Sist. E Informática 6.
  9. Han, J., and Gao, J. (2008). Research challenges for data mining in science and engineering. Next Generation of Data Mining. . Chapman and Hall.
  10. Harrison, T. (07 de 2013). PRWeb. Obtenido de 827.htm.
  11. Holzinger, A., and Jurisica, I. (2014). Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions. Interactive Knowledge Discovery and Data Mining in Biomedical Informatics, 1-18.
  12. Hsu-Hao, T. (2012). Global data mining: An empirical study of current trends, future forecasts and technology diffusions. Expert Systems with Applications, 39(9), 8172-8181. Obtenido de 7417412001704.
  13. Kaur, H., and Wasan, S. K. (2006). Empirical Study on Applications of Data Mining Techniques in Healthcare. Journal of Computer Science, 194-200.
  14. Marcano Aular, Y. J., and Talavera , R. (2007). Minería de Datos como soporte a la toma de decisiones empresariales. Opción, 10-118.
  15. McClellan, M. A. (2009). Duplicate medical records: a survey of twin cities healthcare organizations. AMIA Annual Symposium Proceedings, 421.
  16. McCoy, A. B., Wright, A., Kahn, M. G., Shapiro, J. S., Bernstam, E. V., and Sittig, D. F. (2013). Matching identifiers in electronic health records: implications for duplicate records and patient safety. BMJ quality and safety, 22(3), 219-224.
  17. MinSalud. (09 de October de 2014). FOSYGA. Obtenido de
  18. Oviedo, E., and Fernández, A. (2010). Tecnologías de la información y la comunicación en el sector salud: oportunidades y desafíos para reducir inequidades en América Latina y el Caribe. CEPAL.
  19. Pagliery, J. (09 de 09 de 2014). Home Depot confirms months-long hack. Obtenido de 2014/09/08/technology/security/home-depot-breach/
  20. Shu-Hsien, L., Pei-Hui, C., and Pei-Yuan, H. (2012). Data mining techniques and applications - A decade review from 2000 to 2011. Expert Systems with Applications, 39(12), 11303-11311.
  21. Teorey, T. J., Lightstone, S. S., Nadeau, T., and Jagadish, H. V. (2011). Database modeling and design: logical design. Elsevier.
  22. Viveros, M., Nearhos, J., and Rothman , M. (1996). Applying Data Mining Techniques to a Health Insurance Information System. Proceedings of the 22th International Conference on Very Large Data Bases (págs. 286-294). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
  23. Wai Lup, L., Mong Li, L., and Tok Wang, L. (2001). A knowledge-based approach for duplicate elimination in data cleaning. Information Systems, 26(8), 585-606.
  24. Washington, A. P. (10 de 23 de 2014). newspaper the guardian. Obtenido de technology/2014/aug/23/homeland-security-25000- employees-hacked.
  25. Xintong, G., Hongzhi, W., Song, Y., and Hong, G. (2014). Brief survey of crowdsourcing for data mining. Expert Systems with Applications, 41(17), 7987-7994. Obtenido de article/pii/S0957417414003984.
  26. Yucatan, D. d. (17 de 09 de 2014). Diario de Yucatan. Obtenido de recibe-reconocimiento-fge-por-sistema-forense.

Paper Citation

in Harvard Style

G. Ramírez-Ríos D., P. Manotas Romero L., Paez-Logreira H., Ramírez L. and Andrés Jimenez Florez Y. (2015). Data Mining for the Unique Identification of Patients in the National Healthcare Systems . In Proceedings of the International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES, ISBN 978-989-758-075-8, pages 211-217. DOI: 10.5220/0005287302110217

in Bibtex Style

author={D. G. Ramírez-Ríos and Laura P. Manotas Romero and Heyder Paez-Logreira and Luis Ramírez and Yohany Andrés Jimenez Florez},
title={Data Mining for the Unique Identification of Patients in the National Healthcare Systems},
booktitle={Proceedings of the International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES,},

in EndNote Style

JO - Proceedings of the International Conference on Operations Research and Enterprise Systems - Volume 1: ICORES,
TI - Data Mining for the Unique Identification of Patients in the National Healthcare Systems
SN - 978-989-758-075-8
AU - G. Ramírez-Ríos D.
AU - P. Manotas Romero L.
AU - Paez-Logreira H.
AU - Ramírez L.
AU - Andrés Jimenez Florez Y.
PY - 2015
SP - 211
EP - 217
DO - 10.5220/0005287302110217