POPULATING BIOMEDICAL ONTOLOGIES FROM NATURAL LANGUAGE TEXTS
Juana Maria Ruiz-Martinez, Rafael Valencia-García, Rodrigo Martínez-Béjar, Achim Hoffmann
2010
Abstract
Ontology population is a knowledge acquisition activity that relies on (semi-) automatic methods to transform unstructured, semi-structured and structured data sources into instance data. In this work, a semantic-role based process for ontology population is presented that provides a suitable framework for textual knowledge acquisition in the biological domain. In particular, with our approach, a given ontology can be enriched by adding instances gathered from biological natural language texts. Our system’s modular architecture provides a greater versatility than current approaches in the mentioned domain, as the process of ontology population is not directly dependent on the linguistic rules developed from the corpus.
References
- Amardeilh, F., Laublet, P., Minel, J.L. 2005, Document annotation and ontology population from linguistic extractions, Proceedings of the 3rd international conference on Knowledge capture, , pp. 161-168.
- Agirre, E., Ansa, O., Hovy, E. and Martinez, D. 2000, Enriching very large ontologies using the WWW, Proceedings of the ECAI Ontology Learning Workshop in conjunction with the 14th European Conference on Artificial Intelligence (ECAI 2000).
- Ananiadou, S. & McNaught, J. 2006, Text mining for biology and biomedicine, Artech House(ed).
- Atserias, J., Casas, B., Comelles, E., González, M., Padró L., and Padró, M (2006) FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. Proceedings of the fifth international conference on Language Resources and Evaluation (LREC 2006), ELRA. Genoa, Italy.
- Aussenac-Gilles, N., Despres, S., and Szulman., S, 2008 The TERMINAE Method and Platform for Ontology Engineering from texts. Dans: Bridging the Gap between Text and Knowledge - Selected Contributions to Ontology Learning and Population from Text. Paul Buitelaar, Philipp Cimiano (Eds.), IOS Press, p. 199- 223,
- Bada, M. and Hunter, L. 2007, Enrichment of OBO ontologies, Journal of Biomedical Informatics, vol. 40, no. 3, pp. 300-315.
- Beißwanger, E., Schulz, S., Stenzhorn H. and Hahn, U. 2008. BioTop: An Upper Domain Ontology for the Life Sciences - A Description of its Current Structure, Contents, and Interfaces to OBO Ontologies. Applied Ontology, vol. 3, no. 4, pp. 205-212,
- Buitelaar, P., Cimiano, P. and Magnini, B. 2005, Ontology learning from text: An overview, Ontology learning from text: Methods, evaluation and applications, , pp. 3-12.
- Bundschus, M., Dejori, M., Stetter, M., Tresp, V. and Kriegel, H. 2008, Extraction of semantic biomedical relations from text using conditional random fields, BMC Bioinformatics, vol. 9, no. 1, pp. 207.
- Chun, H., Tsuruoka, Y., Kim, J., Shiba, R., Nagata, N., Hishiki, T. and Tsujii, J. 2006, Extraction of genedisease relations from MedLine using domain dictionaries and machine learning, Pac Symp Biocomput, vol. 11, pp. 4-15.
- Cimiano, P., Pivk, A., Schmidt-Thieme, L. & Staab, S. 2005, Learning taxonomic relations from heterogeneous sources of evidence, Proc of ECAI 2004 Workshop on Ontology Learning and Population, pp. 59-73.
- Dolbey, A., Ellsworth, M. and Scheffczyk, J. 2006, BioFrameNet: A Domain-specific FrameNet Extension with Links to Biomedical Ontologies, Biomedical Ontology in Action KR-MED 2006 Proceedings, , pp. 87-94.
- Filmore, C. 2002. Framenet and the linking between semantic and syntactic relations. In Proceedings of the 19th international conference on computational linguistics (COLING).
- Friedman, C., Borlawsky, T., Shagina, L., Xing, H.R. and Lussier, Y.A. 2006, Bio-Ontology and text: bridging the modeling gap, Bioinformatics, vol. 22, no. 19, pp. 2421-2429.
- Fukuda, K., Tamura, A., Tsunoda, T. & Takagi, T. 1998, Toward information extraction: identifying protein names from biological papers, Pacific Symposium on Biocomputing, pp. 707-718.
- He, X. 2006, A protocol for constructing a domainspecific ontology for use in biomedical information extraction using lexical-chaining analysis. Thesis presented at University of Waterloo.
- Jiang, X. and Tan, A. 2009, Learning and inferencing in user ontology for personalized Semantic Web search, Information Sciences, vol. 179, no. 16, pp. 2794-2808.
- Kulick, S., Bies, A., Liberman, M. , Mandel, M., McDonald, R., Palmer, M., Schein A., and Ungar, L. 2004, Integrated Annotation for Biomedical Information Extraction, HLT/NAACL 2004 Workshop: Biolink, pp. 61-68.
- Lee, K., Hwang, Y., Kim, S. and Rim, H. 2004, "Biomedical named entity recognition using two-phase model based on SVMs", Journal of Biomedical Informatics, vol. 37, no. 6, pp. 436-447.
- Lewis, S.E. 2005 Gene Ontology:looking backwards and fowards, Genome Biol, vol.6, no.1, pp. 103.
- Maedche, A. and Staab, S. 2001, Ontology Learning for the Semantic Web, IEEE Intelligent Systems, vol. 16 (2), pp. 72-79.
- Moreda P., Llorens H., Saquete E., and Palomar M., 2010 Combining semantic information in question answering. Information Processing and Management. Article in Press, Corrected Proof
- Palmer, M., Gildea, D. and Kingsbury, P. 2005. The proposition bank: An annotated corpus of semantic roles. Computational Lingustics, no.31, vol.1, pp.71- 106.
- Petasis, G., Karkaletsis, V. and Paliouras, G. 2007, Ontology population and enrichment: State of the art. Deriverable d4.3, BOEMIE: Bootstrapping Ontology Evolution with Multimedia Information Extraction.
- Rak, R., Kurgan, L., and Reformat, M. 2007, xGENIA: A comprehensive OWL ontology based on the GENIA corpus. Bioinformation.vol.1, no.9, pp.360-362.
- Rosario, B. and Hearst, M.A. 2004, Classifying semantic relations in bioscience texts, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics.
- Rubin, D. L., Shah, N.H. and Noy, N.F. 2008, Biomedical ontologies: a functional perspective, Briefings in Bioinformatics, vol. 9, no. 1, pp. 75-90.
- Sabou, M., Wroe, C., Goble, C. & Stuckenschmidt, H. 2005, Learning domain ontologies for semantic Web service descriptions, Web Semantics: Science, Services and Agents on the World Wide Web, vol. 3, no. 4, pp. 340-365.
- Sánchez, D. and Moreno, A. 2008, Learning nontaxonomic relationships from web documents for domain ontology construction, Data & Knowledge Engineering, vol. 64, no. 3, pp. 600-623.
- Saquete, E., Ferrández, O., Ferrández, S., Martínez-Barco, P. & Muñoz, R. 2008, Combining automatic acquisition of knowledge with machine learning approaches for multilingual temporal recognition and normalization, Information Sciences, vol. 178, no. 17, pp. 3319-3332.
- Settles, B. 2004, Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets, Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), vol. 1, pp. 104-107.
- Shen, D., Zhang, J., Zhou, G., Su, J. & Tan, C.L. 2003, Effective adaptation of a Hidden Markov Model-based named entity recognizer for biomedical domain, Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine, vol. 13, pp. 49- 56.
- Sirin, E., Parsia, B. 2004. Pellet: An OWL DL reasoner. Proc. of the 2004 Description Logic Workshop (DL 2004), pp. 212-213.
- Smith B, Ceusters W, klagges B, Köhler J, Kumar A, Lomax J, et al. 2005, Relations in biomedical ontologies. Genome Biology,no.6, vol.5. R46.
- Studer R, Benjamins VR, Fensel D., 1998, Knowledge engineering: Principles and methods. Data Knowl.Eng. no.25, vol.1-2 pp.161-197.
- Tanev, H. & Magnini, B. 2006, Weakly Supervised Approaches for Ontology Population, Proceedings of EACL-2006, Trento, pp. 3-7.
- Tateisi, Y. & Tsujii, J. 2004, Part-of-Speech Annotation of Biology Research Abstracts, Proceedings of LREC04.
- Tsai, R., Chou, W., Su, Y., Lin, Y., Sung, C., Dai, H., Yeh, I., Ku, W., Sung, T. and Hsu, W. 2007, BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features, BMC Bioinformatics, vol. 8, no. 1, pp. 325.
- Tsuruoka, Y, Tateishi, Y, Kim, J, Ohta, T, McNaught, J, Ananiadou, S, and Tsujii, J., 2005 Developing a Robust Part-of-Speech Tagger for Biomedical Text, Advances in Informatics - 10th Panhellenic Conference on Informatics, LNCS vol.3746, pp. 382- 392.
- Valencia-García,R. Fernández-Breis, J.T., Ruiz-Martínez, J.M., García-Sánchez, F. and Martínez-Béjar, R. A knowledge acquisition methodology to ontology construction for information retrieval from medical documents.2008, Expert Systems: The Knowledge Engineering Journal vol.25, no.3, pp. 314-334.
- Wattarujeekrit, T., Shah, P. and Collier, N. 2004, PASBio: predicate-argument structures for event extraction in molecular biology, BMC Bioinformatics, vol. 5, no. 1, pp. 155.
Paper Citation
in Harvard Style
Maria Ruiz-Martinez J., Valencia-García R., Martínez-Béjar R. and Hoffmann A. (2010). POPULATING BIOMEDICAL ONTOLOGIES FROM NATURAL LANGUAGE TEXTS . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010) ISBN 978-989-8425-29-4, pages 27-36. DOI: 10.5220/0003065000270036
in Bibtex Style
@conference{keod10,
author={Juana Maria Ruiz-Martinez and Rafael Valencia-García and Rodrigo Martínez-Béjar and Achim Hoffmann},
title={POPULATING BIOMEDICAL ONTOLOGIES FROM NATURAL LANGUAGE TEXTS},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010)},
year={2010},
pages={27-36},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003065000270036},
isbn={978-989-8425-29-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010)
TI - POPULATING BIOMEDICAL ONTOLOGIES FROM NATURAL LANGUAGE TEXTS
SN - 978-989-8425-29-4
AU - Maria Ruiz-Martinez J.
AU - Valencia-García R.
AU - Martínez-Béjar R.
AU - Hoffmann A.
PY - 2010
SP - 27
EP - 36
DO - 10.5220/0003065000270036