Semantic Annotation of UMLS using Conditional Random Fields

Shahad Kudama, Rafael Berlanga

2014

Abstract

In this work, we present a first approximation to the semantic annotation of Unified Medical Language System (UMLS®) concept descriptions based on the extraction of relevant linguistic features and its use in conditional random fields (CRF) to classify them at the different semantic groups provided by UMLS. Experiments have been carried out over the whole set of concepts of UMLS (more than 1 million). The precision scores obtained in the global system evaluation are high, between 70% and 80% approximately, depending on the percentage of semantic information provided as input. Regarding results by semantic group, the precision even reaches the 100% in those groups with highest representation in the selected descriptions of UMLS.

References

  1. Burr Settles. Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. Proceedings of the COLING 2004 International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA). Geneva, Switzerland. 2004. Pages 104-107.
  2. Atanas Kiryakov, Borislav Popov, Ivan Terziev, Dimitar Manov, and Damyan Ognyanoff. Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide Web 2 (2004). Pages 49-79.
  3. Ryan McDonald and Fernando Pereira. Identifying gene and protein mentions in text using conditional random fields. BMC Bioinformatics 2005,6:S6.
  4. Ying He and Mehmet Kayaalp. Biological Entity Recognition with Conditional Random Fields. AMIA Annual Symposium Proceedings. 2008: 293-297.
  5. Christoph M. Friedrich, Thomas Revillion, Martin Hofmann and Juliane Fluck. Biomedical and Chemical Named Entity Recognition with Conditional Random Fields: The Advantage of Dictionary Features. Proceedings of the International Symposium of Semantic Mining in Biomedicine (SMBM). 2006. Pages 85-89.
  6. John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning. 2001, pages 282-289.
  7. Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference. 2003. Pages 134-141.
  8. Andrew McCallum and Wei Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the Conference on Natural Language Learning. 2003. Pages 188-191.
  9. Garcia Castro, L.J., McLaughlin, C. and Garcia, A. Biotea: RDFizing PubMed Central in Support for the Paper as an Interface to the Web of Data. Biomedical semantics, 2003. 15;4 Suppl 1:S5.
  10. McCray, A.T., Burgun, A. and Bodenreider, O. Aggregating UMLS semantic types for reducing conceptual complexity, Proceedings of Medinfo, 2001. 10, 216-220.
  11. Charles Sutton and Andrew McCallum. An introduction to Conditional Random Fields, Foundations and Trends in Machine Learning, 4 2012.
  12. NaoakiOkazaki. CRFSsuite Software 2011. http://www.chokkan.org/software/crfsuite/ (18 April 2014)
  13. Apache OpenNLP 2010. http://opennlp.apache.org/ (18 June 2014)
Download


Paper Citation


in Harvard Style

Kudama S. and Berlanga R. (2014). Semantic Annotation of UMLS using Conditional Random Fields . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 335-341. DOI: 10.5220/0005131003350341


in Bibtex Style

@conference{kdir14,
author={Shahad Kudama and Rafael Berlanga},
title={Semantic Annotation of UMLS using Conditional Random Fields},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={335-341},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005131003350341},
isbn={978-989-758-048-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - Semantic Annotation of UMLS using Conditional Random Fields
SN - 978-989-758-048-2
AU - Kudama S.
AU - Berlanga R.
PY - 2014
SP - 335
EP - 341
DO - 10.5220/0005131003350341