Authors:
Wouter Massa
1
;
Parisa Kordjamshidi
2
;
Thomas Provoost
1
and
Marie-Francine Moens
1
Affiliations:
1
KU Leuven, Belgium
;
2
University of Illinois at Urbana-Champaign and KU Leuven, United States
Keyword(s):
Natural Language Processing, Text Mining, Relation Extraction, BioNLP, Bioinformatics, Bacteria, Bacteria Biotopes.
Related
Ontology
Subjects/Areas/Topics:
Algorithms and Software Tools
;
Bioinformatics
;
Biomedical Engineering
;
Data Mining and Machine Learning
;
Pattern Recognition, Clustering and Classification
Abstract:
The tremendous amount of scientific literature available about bacteria and their biotopes underlines the need for efficient mechanisms to automatically extract this information. This paper presents a system to extract
the bacteria and their habitats, as well as the relations between them. We investigate to what extent current techniques are suited for this task and test a variety of models in this regard. To detect entities in a biological
text we use a linear chain Conditional Random Field (CRF). For the prediction of relations between the entities, a model based on logistic regression is built. Designing a system upon these techniques, we explore several
improvements for both the generation and selection of good candidates. One contribution to this lies in the extended flexibility of our ontology mapper, allowing for a more advanced boundary detection. Furthermore,
we discover value in the combination of several distinct candidate generation rules. Using these techniques, we show
results that are significantly improving upon the state of art for the BioNLP Bacteria Biotopes task.
(More)