Disease Identification in Electronic Health Records - An Ontology based Approach

Ioana Barbantan, Camelia Lemnaru, Rodica Potolea


Exploiting efficiently medical data from Electronic Health Records (EHRs) is a current joint research focus of the knowledge extraction and the medical communities. EHR structuring is essential for the efficient exploitation of the information they capture. To that end, concept identification and categorization represent key tasks. This paper presents a disease identification approach which applies several NLP document pre-processing steps, queries the SNOMED-CT ontology and then applies a filtering rule on the retrieved information. The hierarchical approach provides a better filtering of the concepts, reducing the amount of falsely identified disease concepts. We have performed a series of evaluations on the Medline abstracts dataset. The results obtained so far are promising – our method achieves a precision of 87.79% and a recall of 87.12%, better than the results obtained by Apache’s cTAKES system on the same task and dataset.


  1. Barbantan, I., Potolea, R. 2014a. Towards knowledge extraction from electronic health records - automatic negation identification. International Conference on Advancements of Medicine and Health Care through Techonology.”, Cluj-Napoca, Romania.
  2. Barbantan, I., Potolea, R., 2014b. Exploiting Word Meaning for Negation Identification in Electronic Health Records, IEEE AQTR, Cluj-Napoca, Romania.
  3. Batool, R., et al, 2013. Automatic extraction and mapping of discharge summary's concepts into SNOMED CT. Annual International Conference of the IEEE Engineering in Medicine and Biology Society.
  4. Chapman, W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G., 2001. A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries. Journal of Biomedical Informatics 34(5): 301-310.
  5. Clay, R. A., 2012. The Advantages of Electronic Health Records. American Psychological Association, STATE LEADERSHIP CONFERENCE. 43: 72.
  6. Councill, I. G., McDonald, R., Velikovich, L., 2010. What's great and what's not: learning to classify the scope of negation for improved sentiment analysis. Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, Uppsala.
  7. Givon, T., 1993. English Grammar: A Function-Based Introduction. Benjamins, Amsterdam, NL.
  8. Gold, S., Elhadad, N., Zhu, X., Cimino, J.J., Hripcsak, G., 2008. Extracting Structured Medication Event Information from Discharge Summaries. D. o. B. I. Department of Biomedical Informatics. New York.
  9. Halgrim, S. R., Xia, F., Cadag, E., & Uzuner, Ö, 2011. A cascade of classifiers for extracting medication information from discharge summaries. Journal of Biomedical Semantics.
  10. Hina, S., 2010. Extracting the Concepts in Clinical Documents Using SNOMED-CT and GATE. Fourth i2b2/VA Shared-Task and Workshop Challenges in Natural Language Processing for Clinical Data.
  11. Long, W., 2005. Extracting Diagnoses from Discharge Summaries. AMIA 2005 Symposium Proceedings: 470-474.
  12. Long, W., 2007. Lessons extracting diseases from discharge summaries. AMIA Annual Symposium Proceedings.
  13. MTsamples. "Transcribed Medical Transcription Sample Reports and Examples." Last accessed on 23.10, 2012.
  14. Mutalik, P.G., Nadkarni, P.M. 2001. Use of generalpurpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS. Journal of the American Medical Informatics Association 8: 598-609.
  15. Nelson, S.J., Zeng, K., Kilbourne, J., Powell, T. and Moore, R. 2011. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc. v18 i4. 441-448.
  16. Rink, B., Sanda Harabagiu, and Kirk Roberts, 2011. Automatic extraction of relations between medical concepts in clinical texts. Journal of the American Medical Informatics Association 18.5: 594-600.
  17. Rosario, B., Hearst, M. A., Ed. (2004). Classifying Semantic Relations in Bioscience Texts. In Proceedings of the 42th Annual Conference of the Association for Computational Linguistics.
  18. Rudd, K. L., Johnson, M. G., Liesinger, J. T., & Grafft, C. A, 2010. Automated detection of follow-up appointments using text mining of discharge records. International Journal for Quality in Health Care: 229- 235.
  19. Salvadores, M., Alexander PR, Fergerson RW, Musen MA, and Noy NF, 2012. Using SPARQL to Query BioPortal Ontologies and Metadata. International Semantic Web Conference. Boston US. LNCS 7650: 180-195.
  20. Savona G. K, Masanz J. J., Ogren P. V., Zheng J., Sohn S., Kipper-Schuler K. C., Chute C, 2010. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010 Sep-Oct; 17(5): 507-513. doi: 10.1136/jamia.2009.001560
  21. Sibanda, T., et al. , 2006. Syntactically-informed semantic category recognizer for discharge summaries. AMIA annual symposium proceedings.
  22. SNOMED-CT. "International Health Therminology Standards Development Organisation." SNOMED-CT. Retrieved 23.07, 2013, from http://www.ihtsdo.org/snomed-ct/.
  23. SPARQL 1.1 Query Language. W3C Recommendation 21 March 2013. http://www.w3.org/TR/sparql11-query/.
  24. WHO - World Health Organization. 2004. International Statistical Classification of Diseases and Health Related Problems. G. W. H. Organization.

Paper Citation

in Harvard Style

Barbantan I., Lemnaru C. and Potolea R. (2014). Disease Identification in Electronic Health Records - An Ontology based Approach . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 261-268. DOI: 10.5220/0005082002610268

in Bibtex Style

author={Ioana Barbantan and Camelia Lemnaru and Rodica Potolea},
title={Disease Identification in Electronic Health Records - An Ontology based Approach},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},

in EndNote Style

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - Disease Identification in Electronic Health Records - An Ontology based Approach
SN - 978-989-758-048-2
AU - Barbantan I.
AU - Lemnaru C.
AU - Potolea R.
PY - 2014
SP - 261
EP - 268
DO - 10.5220/0005082002610268