NAMED ENTITY RECOGNITION IN BIOMEDICAL LITERATURE USING TWO-LAYER SUPPORT VECTOR MACHINES

Feng Liu, Yifei Chen, Bernard Manderick

Abstract

In this paper, we propose a named entity recognition system for biomedical literature using two-layer support vector machines. In addition, we employ a post-processing module called a boundary check module to eliminate some boundary errors, which can lead to improved system performance. Our system doesn’t make use of any external lexical resources and hence it is a fairly simple system. Furthermore, with carefully designed features and introducing a second layer, our system can recognize named entities in biomedical literature with fairly high accuracy, which can achieve the precision of 83.5%, recall of 80.8% and balanced Fβ=1 score of 82.1%, an approximate state of the art performance for the moment.

References

  1. Mitsumori, T., Fation, S., Murata, M., Doi, K., and Doi, H. (2005). Gene/protein name recognition based on support vector machine using dictionary as features. BMC Bioinformatics, 6(Suppl 1):S8.
  2. Morgan, A. A., Hirschman, L., Colosimo, M., Yeh, A. S., and Colombe, J. B. (2004). Gene name identification and normalization using a model organism database. Journal of Biomedical Informatics, 37(6):396-410.
  3. Rost, B. (2003). Rising accuracy of protein secondary structure prediction. Protein structure determination, analysis, and modeling for drug discovery, pages 207-249.
  4. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In International Conference on New Methods in Language Processing, Manchester, UK.
  5. Smith, L., Rindflesch, T., and Wilburn, W. J. (2004). Medpost: a part-of-speech tagger for biomedical text. BIOINFORMATICS, 20(14):2320-2321.
  6. Song, Y., Yi, E., Kim, E., and Lee, G. G. (2004). Posbiotmner: A machine learning approach for bio-named entity recognition. In Proceedings of BioCreAtIvE Workshop, Granada, Spain.
  7. Takeuchi, K. and Collier, N. (2003). Bio-medical entity extraction using support vector machine. In Proceedings of the ACL-03 Workshop on Natual Language Processing in Biomedicine, pages 57-64.
  8. Tamames, J. (2005). Text detective: a rule-based system for gene annotation in biomedical texts. BMC Bioinformatics, 6(Suppl 1):S10.
  9. Tanabe, L. and Wilbur, W. J. (2002). Tagging gene and protein names in biomedical text. Bioinformatics, 18(8):1124-1132.
  10. Tsuruoka, Y. and Tsujii, J. (2003). Boosting precision and recall of dictionary-based protein name recognition. In Proceedings of the ACL 2003 Workshop on NLP in Biomedicine, pages 41-48.
  11. Vapnik, V. (1995). The nature of statistical learning theory. Springer-Verlog, New York.
  12. Zhou, G., Shen, D., Zhang, J., Su, J., and Tan, S. (2005). Recognition of protein/gene names from text using an ensemble of classifiers. BMC Bioinformatics, 6(Suppl 1):S7.
Download


Paper Citation


in Harvard Style

Liu F., Chen Y. and Manderick B. (2007). NAMED ENTITY RECOGNITION IN BIOMEDICAL LITERATURE USING TWO-LAYER SUPPORT VECTOR MACHINES . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-972-8865-89-4, pages 39-45. DOI: 10.5220/0002357300390045


in Bibtex Style

@conference{iceis07,
author={Feng Liu and Yifei Chen and Bernard Manderick},
title={NAMED ENTITY RECOGNITION IN BIOMEDICAL LITERATURE USING TWO-LAYER SUPPORT VECTOR MACHINES},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2007},
pages={39-45},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002357300390045},
isbn={978-972-8865-89-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - NAMED ENTITY RECOGNITION IN BIOMEDICAL LITERATURE USING TWO-LAYER SUPPORT VECTOR MACHINES
SN - 978-972-8865-89-4
AU - Liu F.
AU - Chen Y.
AU - Manderick B.
PY - 2007
SP - 39
EP - 45
DO - 10.5220/0002357300390045