Towards Creating an Iso-semantic Lexicon Model using Computational Semantics and Sublanguage Analysis Within Clinical Subdomains for Medical Language Processing

B. S. Begum Durgahee, Adi Gundlapalli

2015

Abstract

Although the widespread adoptions of Electronic Health Records (EHR) have made clinical data available in electronic format, a significant amount of important information is represented in unstructured narrative (free text) form. This complicates the use of these clinical data for decision support and research. Recent efforts have been aimed at applying natural language processing (NLP) and information extraction (IE) techniques to clinical text. A common practice is to manually construct semantic lexicons and use those to identify and extract clinical entities for specific tasks such as cohort identification and phenotyping. Besides requiring intensive manual, linguistic and medical knowledge, the vocabulary tends to be restricted to specific institutions and groups of users. There is no standardized way of building lexicons and this impedes the performance of the NLP or IE, due to inconsistent word usage. The objective of the proposed research study is to find a method of leveraging semantic lexicons to enable sharing of lexicons for information extraction from clinical text. Current NLP tools are mostly focused on clinical entity extraction by mapping textual elements to available ontologies. This method is insufficient due to ontology incompleteness and context dependent entities. Hence, there is a need for deeper understanding of relations among these entities in order to expand existing dictionaries accordingly. Lexico-semantic relations and patterns from heterogeneous clinical text will be detected in terms of sublanguage specific patterns. The discovered significant relations and patterns will be used with unsupervised methods, formal concept analysis, distributive analysis techniques and existing ontologies to inform the design of a learning-based system for automatic construction of clinical ontology-based lexicons. The Semantic Web technologies will be investigated to build a common ontology-based lexicons, using ontological and lexical representations. The ultimate goal of using Semantic Web technologies will be to interlink lexical resources with Biomedical ontologies in a computable form for sharing. This research proposal will contribute to the discovery of new concepts and relations in the clinical domain of interest. While automating the ontology-based lexicon construction with minimal supervised learning, we intend to enhance word sense and improve the text processing to retrieve accurate results. The resulting ontology-based semantic lexicon model will provide a new perspective towards standardizing semantic lexicons to facilitate content interoperability for clinical text mining and natural language processing tasks. Such a model will be helpful in predictive modeling studies for personalized healthcare to provide better health care with more efficient use of limited resources.

References

  1. Abacha, A. and Zweigenbaum, P. (2011). Medical entity recognition: A comparison of semantic and statistical methods. Proceedings of BioNLP 2011 Workshop, pages 56-64.
  2. Batet, M., Sánchez, D., and Valls, A. (2011). An ontologybased measure to compute semantic similarity in biomedicine. J Biomed Inform, 44(1):118-25.
  3. Birman-Deych, E., Waterman, A. D., Yan, Y., Nilasena, D. S., Radford, M. J., and Gage, B. F. (2005). Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Medical care, 43(5):480-5.
  4. Buitelaar, P., Cimiano, P., Haase, P., and Sintek, M. (2009). Towards linguistically grounded ontologies. The Semantic Web: Research and Applications Lecture Notes in Computer Science, 5554:111-125.
  5. Chou, Y.-M. and Huang, C.-R. (2010). Hantology: conceptual system discovery based on orthographic convention. CAMBRIDGE University Press.
  6. Cimiano, P. (2005). Text2onto-a framework for ontology learning and data-driven change discovery. NLDB'05 Proc 10th Int Conf Nat Lang Process Inf Syst.
  7. Coden, A. R., Pakhomov, S. V., Ando, R. K., Duffy, P. H., and Chute, C. G. (2005). Domain-specific language models and lexicons for tagging. J Biomed Inform, 38(6):422-30.
  8. Cohen, A. M., Hersh, W. R., Dubay, C., and Spackman, K. (2005). Using co-occurrence network structure to extract synonymous gene and protein names from medline abstracts. BMC Bioinformatics, 6:103.
  9. Friedlin, J. and Overhage, M. (2011). An evaluation of the umls in representing corpus derived clinical concepts. AMIA Annu Symp Proc, 2011:435-44.
  10. Friedman, C., Kra, P., and Rzhetsky, A. (2002). Two biomedical sublanguages: a description based on the theories of zellig harris. Journal of Biomedical Informatics, 35(4):222-235.
  11. Friedman, C., Rindflesch, T. C., and Corn, M. (2013). Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine. Journal of biomedical informatics, 46(5):765-73.
  12. Grishman, R. (2001). Adaptive information extraction and sublanguage analysis. Proc. of IJCAI 2001, pages 1-4.
  13. Guthrie, L., Pustejovsky, J., Wilks, Y., and Slator, B. M. (1996). The role of lexicons in natural language processing. Commun. ACM, 39(1):63-72.
  14. Howes, C., Purver, M., and McCabe, R. (2013). Investigating topic modelling for therapy dialogue analysis. WCS 2013 Workshop on Computational Semantics in Clinical Text.
  15. Huang, J., Dou, D., Dang, J., Pardue, J. H., Qin, X., Huan, J., Gerthoffer, W. T., and Tan, M. (2012). Knowledge acquisition, semantic text mining, and security risks in health and biomedical informatics. World J Biol Chem, 3(2):27-33.
  16. Jiang, G., Ogasawara, K., Endoh, A., and Sakurai, T. (2003). Context-based ontology building support in clinical domains using formal concept analysis. Int J Med Inform, 71(1):71-81.
  17. Johnson, S. B. (1999). A semantic lexicon for medical language processing. J Am Med Inform Assoc, 6(3):205- 18.
  18. Jonnalagadda, S., Cohen, T., Wu, S., Liu, H., and Gonzalez, G. (2013). Using empirically constructed lexical resources for named entity recognition. Biomed Inform Insights, 6(Suppl 1):17-27.
  19. Kate, R. J. (2013). Towards converting clinical phrases into snomed ct expressions. Biomed Inform Insights, 6(Suppl 1):29-37.
  20. Liu, H., Wu, S. T., Li, D., Jonnalagadda, S., Sohn, S., Wagholikar, K., Haug, P. J., Huff, S. M., and Chute, C. G. (2012a). Towards a semantic lexicon for clinical natural language processing. AMIA Annu Symp Proc, 2012:568-76.
  21. Liu, Y., McInnes, B. T., Pedersen, T., Melton-Meaux, G., and Pakhomov, S. (2012b). Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, umls and wordnet. Proceedings of the 2nd ACM SIGHIT symposium on International health informatics - IHI 7812.
  22. MacLean, D. L. and Heer, J. (2013). Identifying medical terms in patient-authored text: a crowdsourcing-based approach. J Am Med Inform Assoc, 20(6):1120-7.
  23. Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., and Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform, pages 128-44.
  24. Niles, I. and Pease, A. (2001). Towards a standard upper ontology. Proceedings of the international conference on Formal Ontology in Information Systems - FOIS 7801, pages 2-9.
  25. Patterson, O. and Hurdle, J. F. (2011). Document clustering of clinical narratives: a systematic study of clinical sublanguages. AMIA Annu Symp Proc, 2011:1099- 107.
  26. Pedersen, T., Pakhomov, S. V. S., Patwardhan, S., and Chute, C. G. (2007). Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform, 40(3):288-99.
  27. Poelmans, J. and Kuznetsov, S. (2013). Formal concept analysis in knowledge processing: A survey on models and techniques. Expert Systems with Applications, pages 1-44.
  28. Reiter, N. and Buitelaar, P. (2008). Lexical enrichment of a human anatomy ontology using wordnet. Proc. Global WordNet Conference (GWC).
  29. Ryu, P. and Choi, K. (2006). Taxonomy learning using term specificity and similarity. Proceedings of the 2nd Workshop on Ontology Learning and Population. Association for Computaional Linguisitcs, pages 41-48.
  30. Sager, N., Lyman, M., Nhàn, N. T., and Tick, L. J. (1995). Medical language processing: applications to patient data representation and automatic encoding. Methods Inf Med, 34(1-2):140-6.
  31. Sanchez, D. and Moreno, A. (2004). Creating ontologies from web documents. Recent Adv Artif Intell Res Dev 2004; IOS Press.
  32. Shivade, C., Raghavan, P., Fosler-Lussier, E., Embi, P. J., Elhadad, N., Johnson, S. B., and Lai, A. M. (2014). A review of approaches to identifying patient phenotype cohorts using electronic health records. Journal of the American Medical Informatics Association : JAMIA, 21(2):221-30.
  33. Sohn, S., Clark, C., Halgrim, S. R., Murphy, S. P., Jonnalagadda, S. R., Wagholikar, K. B., Wu, S. T., Chute, C. G., and Liu, H. (2013). Analysis of crossinstitutional medication description patterns in clinical narratives. Biomed Inform Insights, 6(Suppl 1):7- 16.
  34. Uzuner, O ., South, B. R., Shen, S., and DuVall, S. L. (2011). 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc, 18(5):552-6.
  35. Verspoor, K. (2005). Towards a semantic lexicon for biological language processing. Comp Funct Genomics, 6(1-2):61-6.
  36. Widdows, D. and B, D. (2002). A graph model for unsupervised lexical acquisition. COLING 7802 Proc 19th Int Conf Comput Linguist, 1:1-7.
  37. Wu, S. (2013). Computational semantics in clinical text. Biomed Inform Insights, 6(Suppl 1):3-5.
  38. Wu, S. and Liu, H. (2011). Semantic characteristics of nlpextracted concepts in clinical notes vs. biomedical literature. AMIA Annu Symp Proc, 2011:1550-8.
  39. Wu, S. T., Liu, H., Li, D., Tao, C., Musen, M. A., Chute, C. G., and Shah, N. H. (2012). Unified medical language system term occurrences in clinical notes: a large-scale corpus analysis. J Am Med Inform Assoc, 19(e1):e149-56.
  40. Zhang, Z., Gentile, A., Xia, L., Iria, J., and Chapman, S. (2010). A random graph walk based approach to computing semantic relatedness using knowledge from wikipedia. LREC, pages 1394-1401.
  41. Zweigenbaum, P., Lavergne, T., Grabar, N., Hamon, T., Rosset, S., and Grouin, C. (2013). Combining an expert-based medical entity recognizer to a machinelearning system: methods and a case study. Biomed Inform Insights, 6(Suppl 1):51-62.
Download


Paper Citation


in Harvard Style

Durgahee B. and Gundlapalli A. (2015). Towards Creating an Iso-semantic Lexicon Model using Computational Semantics and Sublanguage Analysis Within Clinical Subdomains for Medical Language Processing . In Doctoral Consortium - DCBIOSTEC, (BIOSTEC 2015) ISBN , pages 42-51


in Bibtex Style

@conference{dcbiostec15,
author={B. S. Begum Durgahee and Adi Gundlapalli},
title={Towards Creating an Iso-semantic Lexicon Model using Computational Semantics and Sublanguage Analysis Within Clinical Subdomains for Medical Language Processing},
booktitle={Doctoral Consortium - DCBIOSTEC, (BIOSTEC 2015)},
year={2015},
pages={42-51},
publisher={SciTePress},
organization={INSTICC},
doi={},
isbn={},
}


in EndNote Style

TY - CONF
JO - Doctoral Consortium - DCBIOSTEC, (BIOSTEC 2015)
TI - Towards Creating an Iso-semantic Lexicon Model using Computational Semantics and Sublanguage Analysis Within Clinical Subdomains for Medical Language Processing
SN -
AU - Durgahee B.
AU - Gundlapalli A.
PY - 2015
SP - 42
EP - 51
DO -