Learning Diagnosis from Electronic Health Records

Ioana Barbantan, Rodica Potolea


In the attempt to build a complete solution for a medical assistive decision support system we proposed a complex flow that integrates a sequence of modules which target the different data engineering tasks. This solution can analyse any type of unstructured medical documents which are processed by applying specific NLP steps followed by semantic analysis which leads to the medical concepts identification, thus imposing a structure on the input documents. The data collection, document pre-processing, concept extraction, and correlation are modules that have been researched by us in our previous works and for which we proposed original solutions. Using the collected and structured representation of the medical records, informed decisions regarding the health status of the patients can be made. The current paper focuses on the prediction module that joins all the components in a logical flow and is completed with the suggested diagnosis classification for the patient. The accuracy rate of 81.25%, obtained on the medical documents supports the strength of our proposed strategy.


  1. Alag, S., 2009. Collective Intelligence in Action. Greenwich: Manning Publications Co.
  2. Albin, A. et al., 2014. Enabling Online Studies of Conceptual Relationships Between Medical Terms: Developing an Efficient Web Platform. JMIR Med Inform, 2(2:e23).
  3. Barban?an, I., Lemnaru, C., Potolea, R., 2014. Disease Identification in Electronic Health Records. An ontology based approach. Rome, Italy, SCITEPRESS, pp. 261-268.
  4. Barban?an, I., Lemnaru, C., Potolea, R., 2015. Concepts Identification in Medical Documents. York, University of Sheffield.
  5. Barban?an, I., Porumb, M., Lemnaru, C., Potolea, R., 2016. Feature Engineered Relation Extraction - Medical Documents Setting. International Journal of Web Information Systems (IJWIS), 12(3), pp. 336-358.
  6. Barban?an, I., Potolea, R., 2014. Exploiting Word Meaning for Negation Identification in Electronic Health Records. Cluj-Napoca, IEEE Computer Society, pp. 283-289.
  7. Barban?an, I., Potolea, R., 2015. Knowledge Extraction and Prediction from Medical Documents. Ohrid, ICT ACT.
  8. Boaz, D., Shahar, Y., 2003. Idan: A distributed temporalabstraction mediator for medical databases. Protaras, Cyprus, Proceedings of the 9th Conference on Artificial Intelligence in Medicine-Europe (AIME).
  9. Bodenreider, O., 2004. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue), pp. 264-270.
  10. Chapman, W., Bridewell, W., Hanbury, P., Cooper, G. F., 2001. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 34(5), pp. 301-310.
  11. D'Avolio, L., 2013. 6 Questions to Guide Natural Language Processing Strategy. Information Week, 18 February.
  12. Doan, S. et al., 2012. Recognition of medication information from discharge summaries using ensembles of classifiers. s.l., BMC Med Inform Dec Mak, p. 36.
  13. Doing-Harris, K., Livnat, Y., Meystre, S., 2015. Automated concept and relationship extraction for the semiautomated ontology management (SEAM) system. Journal of Biomedical Semantics.
  14. Edsall, R. L., Adler, K. G., 2008. User Satisfaction With EHRs: Report of a Survey of 422 Family Physicians. Family Practice Management, 15(2), pp. 25-32.
  15. Hall, M. et al., 2009. The WEKA Data Mining Software: An Update. s.l.:SIGKDD Explorations.
  16. Helpline, M., 2010. Transcribed Medical Transcription Sample Reports and Examples. [Online] Available at: http://www.mtsamples.com [Accessed 4 January 2016].
  17. Henriksson, A. et al., 2014. Synonym extraction and abbreviation expansion with ensembles of semantic spaces. J Biomed Semantics.
  18. Hsiao, C., Hing, E., 2014. Use and characteristics of electronic health record systems among office-based physician practices: United States, 2001-2013. NCHS Data Brief, January, Volume 143, pp. 1-8.
  19. Jamoom, E. et al., 2012. Physician adoption of electronic health record systems: United States, 2011. NCHS Data Brief, July, Issue 98, pp. 1-8.
  20. Jiang, M. et al., 2014. Extracting and standardizing medication information in clinical text - the MedExUIMA system. s.l., s.n.
  21. Jonquet, C., Musen, M. A., Shah, N. H., 2010. Building a biomedical ontology recommender web service. Journal of Biomedical Semantics (Suppl 1).
  22. Jonquet, C., Nigam, H., Shah, H., Musen, A. M., 2009. The Open Biomedical Annotator. AMIA Summit on Translational Bioinformatics, pp. 56-60.
  23. Lee, M., 2015. New stroke therapy uses motion sensor video game to help rehabilitation, New York: Metro.
  24. Lichman, M., 2013. UCI Machine Learning Data Repository, s.l.: s.n.
  25. Lucey, MD, C. R., 2015. Clinical Problem Solving - Coursera. San Francisco: University of California.
  26. Musen, M. et al., 2012. The National Center for Biomedical Ontology. J Am Med Inform Assoc., 19(2), pp. 190-5.
  27. Porumb, M., Barban?an, I., Lemnaru, C., Potolea, R., 2015. REMed - Automatic Relation Extraction from Medical Documents. Brussels, s.n.
  28. Smith, C., 2014. Tracking Hand Tremors with Leap Motion, San Francisco, CA: Leap Motion.
  29. Szenasi, G., Lemnaru, C., Barban?an, I., 2015. Concept extraction from medical documents. A contextual approach. Cluj-Napoca, IEEE.

Paper Citation

in Harvard Style

Barbantan I. and Potolea R. (2016). Learning Diagnosis from Electronic Health Records . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 344-351. DOI: 10.5220/0006069503440351

in Bibtex Style

author={Ioana Barbantan and Rodica Potolea},
title={Learning Diagnosis from Electronic Health Records},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},

in EndNote Style

JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - Learning Diagnosis from Electronic Health Records
SN - 978-989-758-203-5
AU - Barbantan I.
AU - Potolea R.
PY - 2016
SP - 344
EP - 351
DO - 10.5220/0006069503440351