concept X in the gold standard, the model should ide-
ally recognize that X corresponds to the phrase. How-
ever, automated models are not perfect and sometimes
make mistakes. In this case, the desired outcome
would be for the model to recognize that semantically
similar concepts to X (such as X’s parent) exist in the
ontology and that those concepts should be associated
with the text. Ontology sentient models take the on-
tology structure or the ontology graph as input while
training and make predictions accordingly. However,
developing accurate ontology sentient models can be
a challenge due to the size and complexities of the on-
tology graph that often results in models that are too
large or require an inordinate time for training.
The automated annotation models previously de-
veloped by this team (Manda et al., 2018; Manda
et al., 2020; Devkota et al., 2022b; Devkota et al.,
2022a) have shown good accuracy in recognizing on-
tology concepts from text. In most cases, these sys-
tems are able to predict the same ontology concept as
the ground truth in the gold standard data achieving
perfect accuracy. However, ontology-based predic-
tion systems can also achieve partial accuracy. This
happens when a model might not predict the exact
ontology concept as the gold standard but a related
concept (sub-class or super-class), thereby achieving
partial accuracy. In cases where our models were not
able to predict the ground truth exactly, they failed
to achieve reasonable partial accuracy and predicted
concepts that were highly unrelated to the ground
truth. Thus, our models’ accuracy could be improved
by focusing on improving the partial accuracy for in-
stances when the model fails to make an exact predic-
tion.
With the above motivation, here, we present an al-
ternative approach called Ontology-powered Boost-
ing (OB) to improve the prediction performance
of automated curation models by using informa-
tion about the ontology hierarchy to post-process the
model’s predictions after training has completed. The
goal of OB is to combine the model’s preliminary
predictions with knowledge of the ontology hierarchy
to selectively increase the confidence of certain pre-
dictions to improve overall prediction accuracy. The
method relies on a computationally inexpensive cal-
culation and avoids bloated machine learning models
that cannot be trained or deployed without requiring
enormous resources.
Note that the contribution of this work is not in
presenting novel architectures for recognizing ontol-
ogy concepts but rather in presenting the Ontology
Boosting approach for further improving prediction
accuracy of our previously published deep learning
architectures. Hence, we will present architecture de-
tails briefly and will refer the reader to our prior work
for complete details.
2 BACKGROUND
Automated methods of recognizing ontology con-
cepts in literature have been developed in the last
decade and the approaches range from lexical anal-
ysis to traditional machine learning to deep learning
in more recent times.
Text mining tools that use traditional machine
learning based methods employ supervised learning
techniques using gold standard corpora (Beasley and
Manda, 2018). In 2018, we conducted a survey of
ontology-based Named Entity Recognition and con-
ducted a formal comparison of methods and tools
for recognizing ontology concepts from scientific lit-
erature. Three concept recognition tools (MetaMap
(Aronson, 2001), NCBO Annotator (Jonquet et al.,
2009), Textpresso (M
¨
uller et al., 2004) were com-
pared (Beasley and Manda, 2018).These methods can
form generalizable associations between text and on-
tology concepts leading to improved accuracy.
The rise of deep learning in the areas of image
and speech recognition has translated into text-based
problems as well. Preliminary research has shown
that deep learning methods result in greater accu-
racy for text-based tasks including identifying ontol-
ogy concepts in text (Lample et al., 2016; Habibi
et al., 2017; Lyu et al., 2017; Wang et al., 2018;
Manda et al., 2020). Deep learning methods use vec-
tor representations that enable them to capture depen-
dencies and relationships between words using en-
riched representations of character and word embed-
dings from training data (Casteleiro et al., 2018). We
evaluated the feasibility of using deep learning for the
task of recognizing ontology concepts in a 2018 study
(Manda et al., 2018). We compared Gated Recurrent
Units (GRUs), Long Short Term Memory (LSTM),
Recurrent Neural Networks (RNNs), and Multi Layer
Perceptrons (MLPs) and evaluated their performance
on the CRAFT gold standard dataset. We also intro-
duced a new deep learning model/architecture based
on combining multiple GRUs with a character+word
based input. We used data from five ontologies in
the CRAFT corpus as a Gold Standard to evaluate
our model’s performance. Results showed that our
GRU-based model outperformed prior models across
all five ontologies. These findings indicated that deep
learning algorithms are a promising avenue to be ex-
plored for automated ontology-based curation of data.
This study also served as a formal comparison and
guideline for building and selecting deep learning
Ontology-Powered Boosting for Improved Recognition of Ontology Concepts from Biological Literature
81