Cross-Domain Classiﬁcation of Domain Entities into Top-Level Ontology

Concepts Using BERT: A Study Case on the BFO Domain Ontologies

Alcides Lopes

, Joel Carbonera

, Nicolau Santos

, Fabricio Rodrigues

, Luan Garcia

and Mara Abel

Universidade Federal do Rio Grande do Sul, Institute of Informatics, Porto Alegre, Brazil

Keywords:

Top-Level Ontology Classiﬁcation, Cross-Domain Classiﬁcation, Informal Deﬁnition, Language Model.

Abstract:

Classifying domain entities into top-level ontology concepts using informal deﬁnitions remains an active re-

search area with several open questions. One of these questions pertains to the quality of proposed pipelines

employing language models for classifying informal deﬁnitions when training and testing samples are from

different knowledge domains. This can introduce challenges due to varying vocabularies across domains or the

potential for an entity to belong to different top-level concepts based on its domain. In this study, we present

a study case where terms and informal deﬁnitions are extracted from 81 domain ontologies organized into 12

knowledge domains. We investigate the performance of a pipeline that utilizes the BERT language model for

classifying domain entities into top-level concepts within a cross-domain classiﬁcation scenario. Additionally,

we explore various pipeline setups for input, preprocessing, and training steps. Our optimal classiﬁer setup

employs an unbalanced training methodology, no text preprocessing, and the concatenation of terms and in-

formal deﬁnitions as input. Furthermore, we demonstrate that BERT yields promising results in classifying

domain entities into top-level concepts within a cross-domain classiﬁcation scenario.

1 INTRODUCTION

The explosion of digital data and the proliferation of

information sources have created an urgent need for

effective methods to organize and classify domain en-

tities. Ontologies, serving as structured and formal

knowledge representations, are pivotal in integrating

information across domains. Top-level ontologies,

which deﬁne high-level and domain-independent con-

cepts and relationships, are key to achieving data

interoperability among various domain ontologies.

However, the classiﬁcation of domain entities into

concepts deﬁned in top-level ontologies is tradition-

ally manual and demands signiﬁcant expertise in the

target domain and ontology engineering (Lopes et al.,

2022). The automation of this classiﬁcation process is

essential for streamlining ontology development, po-

tentially saving time and effort for engineers. Despite

https://orcid.org/0000-0003-0622-6847

https://orcid.org/0000-0002-4499-3601

https://orcid.org/0000-0003-0901-2465

https://orcid.org/0000-0002-0615-8306

https://orcid.org/0000-0001-9328-9007

https://orcid.org/0000-0002-9589-2616

some progress, the ﬁeld still faces unresolved chal-

lenges.

This study focuses on classifying domain enti-

ties into top-level ontology concepts across different

knowledge domains. We analyze 81 domain ontolo-

gies across 12 knowledge domains, using the Basic

Formal Ontology (BFO) (Arp et al., 2015) for our tar-

get top-level ontology concepts, diverging from previ-

ous works that employed Dolce-Lite-Plus (DLP) (Jul-

lien et al., 2022; Lopes et al., 2022; Lopes et al.,

2023). By leveraging the BERT language model (De-

vlin et al., 2018) in the classiﬁcation model proposed

in (Lopes et al., 2023), we investigate various aspects

of classiﬁcation, including input structure and prepro-

cessing and the balance of the classes in the training

data.

In the scope of this work, we are not interested

in evaluating the performance of state-of-the-art lan-

guage models, such as Llamma (Touvron et al., 2023),

and Mixtral (Jiang et al., 2024), or the current trends

in ChatGPT and prompt engineering (Sahoo et al.,

2024). Still, we step back to analyze the conse-

quences of the decision-making while developing a

classiﬁcation pipeline using an already tested related

pipeline. From that, in our experiments, we showed

Lopes, A., Carbonera, J., Santos, N., Rodrigues, F., Garcia, L. and Abel, M.

Cross-Domain Classiﬁcation of Domain Entities into Top-Level Ontology Concepts Using BERT: A Study Case on the BFO Domain Ontologies.

DOI: 10.5220/0012557600003690

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 26th International Conference on Enterprise Information Systems (ICEIS 2024) - Volume 2, pages 141-148

ISBN: 978-989-758-692-7; ISSN: 2184-4992

141

that using unbalanced training and without text pre-

processing achieved the best overall results with an

average of 62% in the macro F-score across all the

12 domains evaluated. Although this average result

seems not so expressive, our experiments demonstrate

that we achieve more than 90% in F-core for many

target classes in many domains, contributing to the

ongoing discussion on cross-domain classiﬁcation of

domain entities into top-level ontology concepts.

The paper is organized as follows. In Section 2,

we present the background notions supporting this

case study, revisiting ontologies, top-level ontologies,

and the Open Biological and Biomedical Ontologies

(OBO) Foundry. In Section 3, we describe the re-

search questions we aim to answer in this case study

and how we extracted the datasets from the OBO

Foundry. Section 4 showcases the experiments per-

formed using the extracted datasets from different

knowledge domains and presents the obtained results.

Finally, Section 5 offers concluding remarks on our

work.

2 RELATED WORK

This section covers three primary topics. First, we

delve into ontologies, exploring their signiﬁcance in

knowledge modeling, knowledge-sharing, and inter-

operability, with a speciﬁc focus on the OBO Foundry

and the Basic Formal Ontology (BFO). Second, we

provide an overview of language models, with a par-

ticular emphasis on BERT, highlighting their capac-

ity to comprehend human language. Finally, we dis-

cuss various approaches to classify domain entities

into top-level ontology concepts.

2.1 Ontologies, OBO Foundry and the

Basic Formal Ontology

Ontologies serve as structured frameworks for rep-

resenting and organizing domain-speciﬁc knowledge.

They are formally and explicitly speciﬁed shared

conceptualizations that capture entities, relationships,

and constraints within domains (Studer et al., 1998).

Also, ontologies support knowledge sharing and com-

munication between humans and machines, offering

a common vocabulary and deﬁned relationships for

data integration and interoperability across systems

and applications. The Open Biological and Biomedi-

cal Ontologies (OBO) Foundry is an initiative aimed

at developing interoperable ontologies to represent

and integrate data across various knowledge domains,

including biomedicine, agriculture, and information

technology, among others (Jackson et al., 2021; Smith

et al., 2007). It provides a framework for ontology

creation, maintenance, and sharing, following princi-

ples of open use, collaborative development, and con-

tent speciﬁcity with common syntax and relations.

The OBO Foundry includes ontologies from di-

verse domains, such as agriculture, anatomy, biolog-

ical systems, and health, totaling 185 domain ontolo-

gies available on its website. These ontologies cover

concepts, relationships, and instances, integrating in-

formation from dictionaries, glossaries, and other re-

sources. To ensure semantic interoperability among

these ontologies, the Basic Formal Ontology (BFO)

is employed as a top-level ontology within the OBO

Foundry. BFO facilitates the creation of domain-

speciﬁc ontologies by addressing the most general as-

pects of reality in a domain-independent way (Arp

et al., 2015). Approximately half of the domain on-

tologies in the OBO Foundry utilize BFO, enhanc-

ing their interoperability. For instance, the Environ-

ment Ontology (ENVO) (Buttigieg et al., 2013) in-

tegrates elements from other domain ontologies that

also use BFO as top-level ontology, such as the Plant

Ontology (PO) (Jaiswal et al., 2005), the Relation On-

tology (RO)

, the Food Ontology (FoodOn) citedoo-

ley2018foodon, and the Chemical Entities of Biolog-

ical Interest (ChEBI) (Degtyarenko et al., 2007).

2.2 Language Model

Language models are computational models that un-

derstand and generate human language by being

trained on large text datasets (Touvron et al., 2023;

Jiang et al., 2024). They use statistical learning to

encode semantic and syntactic information, facilitat-

ing a wide range of tasks such as machine translation,

sentiment analysis, and information retrieval. These

models employ word embeddings, dense vectors that

capture relationships between words and sentences, to

process and generate language meaningfully. In this

context, BERT (Devlin et al., 2018) plays a signif-

icant advancement by using a transformer-based ar-

chitecture to understand the bidirectional context in

text. From that, BERT improves word embeddings

by considering the full context of words in both di-

rections, employing two unsupervised tasks for pre-

training: Masked Language Model (MLM) and Next

Sentence Prediction (NSP). In MLM, it predicts ran-

domly masked tokens, while in NSP, it determines if

one sentence logically follows another.

In this work, BERT is preferred over state-of-the-

art language models like Llama (Touvron et al., 2023)

and Mixtral (Jiang et al., 2024), because its architec-

ture and pre-training approach offer speciﬁc advan-

https://github.com/oborel/obo-relations

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

142

tages for our experiments. In an analogy, employ-

ing newer language models is like shooting a duck

with a missile because we are trying to understand the

nuances of classifying domain entities into top-level

ontology concepts using informal deﬁnitions. How-

ever, these newer models are much bigger and require

too much computational power and time to ﬁne-tune.

In this context, BERT has the advantage of using a

bidirectional training strategy that allows understand-

ing the context from both directions of the sentences,

providing a deeper understanding of language struc-

ture and meaning. Additionally, BERT has a smaller

model size compared to newer ones, without the need

to reduce the ﬂoat point precision in our resource-

constrained environment with a single RTX 3060 with

12GB of VRAM, and then BERT can be used at its

full power.

2.3 Strategies for Classiﬁcation of

Domain Entities into Top-Level

Ontology Concepts

Approaches to classify domain entities into top-level

ontology concepts typically use text representation

and machine learning. Typically, domain entities

are represented by their names, example sentences,

or informal deﬁnitions. This representation is cru-

cial for addressing challenges like polysemy, where

a term may have multiple meanings. (Lopes et al.,

2022) combine word embeddings of terms and infor-

mal deﬁnitions to reduce polysemy, using a model

that merges a feed-forward neural network with a bi-

LSTM. However, this requires the term’s presence in

a pre-trained model. (Jullien et al., 2022) suggest us-

ing a term concatenated with an example sentence

as input for the BERT model, although this method

faces challenges with polysemy and dependency on

the term’s inclusion in the Brown corpus (Francis,

1965). (Lopes et al., 2023) propose a more effec-

tive strategy by combining terms and informal deﬁni-

tions into a single text input, thus avoiding reliance on

external corpora. They ﬁne-tuned various language

models focusing on concepts from the Dolce-Lite and

Dolce-Lite-Plus top-level ontologies. While effective,

the adaptability of this method across different on-

tologies and domains and under varied preprocessing

and training conditions remains to be fully explored.

3 THE STUDY CASE

In this section, we delineate the scope of this work

and describe the research questions we aim to answer

in the proposed study case. Additionally, we present

how we extract the dataset from the OBO Foundry,

which we used for a cross-domain classiﬁcation of

domain entities into top-level ontology concepts de-

ﬁned in the Basic Formal Ontology (BFO).

3.1 Scope and Research Questions

This study explores the effectiveness of the pipeline

detailed in (Lopes et al., 2023) in classifying domain

entities into top-level concepts within a cross-domain

classiﬁcation scenario, incorporating top-level con-

cepts from a different ontology. Additionally, we

investigate various modiﬁcations introduced during

the input, preprocessing, and training stages of the

original pipeline. Figure 1 illustrates four distinct

pipelines, with P1 serving as our baseline and P2

to P4 representing step-wise modiﬁcations at each

stage. Through our investigation and the examination

of these pipelines, we aim to address the following

research questions:

1. Research Question: How Do the Classiﬁca-

tion Pipelines Perform Using Only the Infor-

mal Deﬁnitions as Input Rather than Combin-

ing Them with the Terms?

Explanation: In (Lopes et al., 2023), the au-

thors advocate combining the term representing

a domain entity with its informal deﬁnition in

a single text sentence to address the polysemy

problem and eliminate dependencies on external

knowledge sources, such as the Brown Corpus or

pre-trained word embeddings. However, utilizing

only the informal deﬁnition also mitigates these

challenges. This research question delves into

the performance implications of employing both

terms and informal deﬁnitions versus using only

Term+informal

definition

Only informal

definition

Input Preprocessing

Not performed

Training

Balanced

Term+informal

definition

Term+informal

definition

Without stop

words

Not performed

Balanced

Unbalanced

Pipelines

Stages

Figure 1: Pipelines for classifying domain entities into top-

level ontology concepts, where P1 is our baseline pipeline,

and for P2 to P4, we modify each stage at a time.

Cross-Domain Classiﬁcation of Domain Entities into Top-Level Ontology Concepts Using BERT: A Study Case on the BFO Domain

Ontologies

143

informal deﬁnitions as the input for the pipelines

outlined in Figure 1.

2. AP Research Question: How Do the Classiﬁca-

tion Pipelines Perform by Removing the Stop

Words from the Input Sentence?

Explanation: Stop words, including common

terms like ”a,” ”an,” ”the,” and ”of,” are frequently

occurring words in textual sentences. In our per-

spective, these stop words play a vital role in

shaping sentence structures from which language

models, such as BERT, can learn patterns to dis-

tinguish the top-level concept of a domain entity.

This research question delves into the impact of

stop words on classiﬁcation results by conducting

experiments both with and without them in the in-

put sentence.

3. Research Question: How Do the Classiﬁcation

Pipelines Perform Using Balanced and Unbal-

anced Training?

Explanation: In (Lopes et al., 2023), an un-

balanced training methodology was employed to

address the characteristic class imbalance in the

datasets, where few classes have the majority of

instances. However, this strategy is computation-

ally expensive due to the size of the training set.

On the other hand, (Lopes et al., 2022) explored a

balanced training methodology, i.e., all the classes

have the same number of training instances. How-

ever, in their work, they reduced the number of

target classiﬁcation classes to the 30 most popu-

lated ones, resulting in faster training times but

poorer results due to the exclusion of signiﬁcant

training data. This research question assesses the

performance of these diverse training methodolo-

gies to comprehend their impact, considering the

four target classes utilized in this study.

4. Research Question: How Do the Classiﬁcation

Pipelines Perform with Other Top-Level On-

tology Concepts?

Explanation: Previous studies have primarily fo-

cused on evaluating classiﬁcation pipelines using

the Dolce-Lite and Dolce-Lite-Plus top-level con-

cepts. While these ontologies offer diverse con-

cepts, the generalizability of the pipeline to other

top-level ontologies, such as the Basic Formal On-

tology (BFO), remains untested. To address this

gap, our experiments target four BFO top-level

concepts: independent continuant, generically de-

pendent continuant, speciﬁcally dependent con-

tinuant, and process. Although BFO encompasses

numerous top-level concepts, our selection con-

centrates on these four, which represent all the

concepts in level 3 of the BFO taxonomy and are

subsumed by all domain-speciﬁc entities in our

dataset.

5. Research Question: How Do the Classiﬁcation

Pipelines Perform in a Cross-Domain Classiﬁ-

cation Scenario?

Explanation: Current datasets for classifying do-

main entities into top-level concepts often rely on

aligning WordNet with the Dolce-Lite-Plus top-

level ontology (Gangemi et al., 2003). While

valuable insights can be drawn from experiments

on domain-independent datasets, the majority of

domain ontologies include entities speciﬁc to their

domains that are not present in WordNet or other

broader knowledge domains. Hence, this research

question investigates the performance of the clas-

siﬁcation pipeline in a cross-domain classiﬁcation

scenario, where the training and testing datasets

originate from different knowledge domains.

3.2 Dataset Preparation

Our research questions prompted an exploration be-

yond the Dolce-Lite-Plus and Dolce-Lite top-level

ontologies. In this context, we also assess the effec-

tiveness of the pipelines outlined in Figure 1 within a

cross-domain classiﬁcation scenario. A notable con-

tribution of this study is the introduction of a novel

dataset for classifying domain entities into top-level

ontology concepts. To achieve this, we extracted

the target classes from the Basic Formal Ontology

(BFO) top-level ontology, along with the correspond-

ing terms and informal deﬁnitions, utilizing the re-

sources available in the OBO Foundry.

To identify suitable domain ontologies within the

OBO Foundry aligning with the objectives of our

study, we employed three criteria: the presence of Ba-

sic Formal Ontology (BFO) as the top-level ontology,

the feasibility of performing reasoning, and the avail-

ability of terms and informal deﬁnitions for domain

entities. Table 1 illustrates the outcome of this selec-

tion process, wherein 81 domain ontologies were cho-

sen from the 154 listed in the OBO Foundry reposi-

tory. These selected ontologies are categorized into

12 distinct knowledge domains, contributing a total

of 218,630 domain entities for our analysis. Also, we

excluded the Phenotype knowledge domain from our

study due to the lack of consensus in the literature re-

garding the deﬁnition of a phenotype concerning the

target concepts of independent continuant or speciﬁ-

cally dependent continuant.

In the OBO Foundry, many ontologies incorporate

common domain entities from diverse knowledge do-

mains, leading to instances shared among ontologies.

To address this, we adopted a speciﬁc approach for

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

144

Table 1: The description of the knowledge domains explored in this work and the number of domain ontologies and entities

in each domain.

ID Domain

Nº of available

Domain Ontologies

Nº of selected

Domain Ontologies

Nº of

entities

D1 Agriculture 2 2 954

D2 Anatomy and development 35 12 37,641

D3 Biological systems 7 5 44,420

D4 Chemistry and biochemistry 16 7 59,157

D5 Diet, metabolomics, and nutrition 6 4 18,463

D6 Environment 7 6 8,257

D7 Health 37 17 22,481

D8 Information 11 6 1,415

D9 Information technology 4 3 1,324

D10 Investigations 13 11 8,059

D11 Microbiology 2 1 3,019

D12 Organisms 14 7 13,440

Total 154 81 218,630

Table 2: The number of domain entities of each top-level

ontology concept evaluated in this work, in which ID means

Independent Continuant, SDC means Speciﬁcally Depen-

dent Continuant, GDC means Generically Dependent Con-

tinuant and P means Process.

ID IC SDC GDC P

D1 219 327 112 296

D2 37,017 6 - 618

D3 4,196 105 99 40,020

D4 50,537 2,490 2,490 4,359

D5 16,230 1,643 350 240

D6 4,218 394 32 3,613

D7 1,603 9,411 661 10,806

D8 32 49 1,044 290

D9 8 8 1,047 261

D10 1,180 687 3,191 3,001

D11 2,996 23 - -

D12 5,819 1,559 1,555 4,507

Total 124,055 15,983 10,581 68,011

identifying domain entities within each ontology. For

instance, in the Agronomy Ontology, we considered

only those concepts and instances bearing ”AGRO”

in the IRI preﬁx as domain entities. Consequently,

while ”AGRO 00001” is recognized as a domain en-

tity in the Agronomy Ontology, ”GO 00001,” despite

its presence in the Agronomy Ontology, is not con-

sidered a domain entity of this ontology. A similar

processing approach was applied to each domain on-

tology utilized in this study.

Table 2 outlines the instance distribution across

domains and target classes. The Independent Con-

tinuant (IC) class emerges as the most populous,

boasting 124,055 instances, followed by the Process

(P), Speciﬁcally Dependent Continuant (SDC), and

Generically Dependent Continuant (GDC) classes

with 68,011, 15,983, and 10,581 instances, respec-

tively.

4 EXPERIMENTAL EVALUATION

In this section, we explain how we create the cross-

domain evaluation scenario, detailing what we used

for training, evaluating, and testing the classiﬁer.

Also, we show the results achieved by performing the

experiments with each pipeline described in Figure 1

and pointing out the result analysis according to each

research question detailed in Section 3.1.

4.1 Evaluation Procedure

To assess the effectiveness of our study case in a

cross-domain classiﬁcation scenario, we systemati-

cally excluded each domain dataset from training

and merged all the remaining datasets to train the

pipelines described in Figure 1. From that, we

used the excluded dataset from training to test each

pipeline. Also, we employed a stratiﬁed k-fold cross-

validation approach with k = 10 for leveraging the

pipeline’s performances across different subsets of the

training data. In addition, we used the Early Stopping

technique to evaluate the pipeline’s performances in

each training epoch, with a patience parameter set to

3. As the loss function, we used the categorical cross-

entropy (Equation 1).

y, y) = −

∑

(k)

log

(k)

(1)

where K is the set of classes, y

(k)

indicates

whether class label k is the correct classiﬁcation, and

(k)

is the probability of being the class k.

To evaluate the results of each experiment, we em-

ployed the F1-score (Equation 2), with TP, FP, and

FN denoting true positives, false positives, and false

Cross-Domain Classiﬁcation of Domain Entities into Top-Level Ontology Concepts Using BERT: A Study Case on the BFO Domain

Ontologies

145

negatives, respectively. The experiments were con-

ducted on a machine equipped with an Intel i7-10700

CPU (4.8GHz), 32 GB of RAM, and a GeForce RTX

3060 GPU with 12GB of VRAM

2 ∗ T P

2 ∗ T P + FP + FN

(2)

4.2 Experimental Results

As previously mentioned, our experiments involved

the evaluation of ﬁve pipelines, including the base-

line classiﬁer (P1) and variations at each step. The

F-score results for the baseline pipeline (P1) on the

12 domain datasets are presented at the top of Fig-

ure 2. Notably, we achieved F-score results exceeding

90% for at least one class in numerous datasets, un-

derscoring the adaptability of the pipeline proposed

by (Lopes et al., 2023) in cross-domain classiﬁcation

scenarios and with BFO top-level ontology. The Pro-

cess class demonstrated the highest average F-score at

73%, showcasing promising results across diverse do-

mains. However, the Speciﬁcally Dependent Contin-

uant and Generically Dependent Continuant classes

exhibited lower results, often attributed to the limited

number of instances, as observed in D2, D3, D6, D9,

and D11 datasets. Alternatively, these lower results

might be linked to the process of creating informal

deﬁnitions for other datasets. Based on this analysis

of the baseline results, we proceeded to evaluate each

research question outlined in Section 3.1, as detailed

below:

1. Research Question: How Do the Classiﬁca-

tion Pipelines Perform Using Only the Infor-

mal Deﬁnitions as Input Rather than Combin-

ing Them with the Terms?

Result Analysis: The plot of Pipeline 2 (P2) in

Figure 2 demonstrates that despite achieving F-

scores above 90% for numerous domains, the re-

sults are comparatively poor when compared to

the baseline pipeline (P1). This suggests that re-

lying solely on informal deﬁnitions as input sen-

tences is less effective than combining them with

entity terms, emphasizing the crucial role of in-

cluding terms for training and classifying using

the pipeline proposed in (Lopes et al., 2023). In

The source code and data are available at https://

github.com/BDI-UFRGS/MultiDomainClassiﬁcation

We try to use newer models like Mixtral and Llama 2 in

our experiments. However, each one would take around 125

days to just generate the embeddings for all the instances in

our dataset. Also, we would have to use a reduced ﬂoating

point precision to 4 bits to be able to ﬁt the language models

to the GPU used.

avg.

59%

avg.

55%

avg.

57%

avg.

62%

Figure 2: Results achieved by performing each pipeline in a

cross-domain classiﬁcation scenario. The size of the marks

is in relation to the number of instances of each class in each

domain to the total of all domains.

other words, terms carry characteristics necessary

for classifying domain entities into top-level on-

tology concepts. Essentially, as ontology entities’

terms are established before their informal deﬁ-

nitions during ontology development, excluding

them in the training process appears counterpro-

ductive. Overall, pipeline P2 achieved an average

F-score 4% below the baseline.

2. AP Research Question: How Do the Classiﬁca-

tion Pipelines Perform by Removing the Stop

Words from the Input Sentence?

Result Analysis: The plot of Pipeline 3 (P3)

in Figure 2 shows the results of removing stop

words from the input sentences. The hypothe-

sis was that removing stopwords might enhance

the focus on key terms and possibly improve the

classiﬁcation F-score. The results, however, de-

pict a nuanced picture. While for some domains,

the removal of stopwords resulted in a marginal

increase in F-score, indicating a slight improve-

ment in focus on relevant terms, for other do-

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

146

mains, the impact was negligible or even nega-

tive, leading to a decrease in classiﬁcation accu-

racy. This variation suggests that the role of stop-

words is context-dependent and can vary signiﬁ-

cantly across different domains. In comparison to

the baseline pipeline (P1), where sentences were

used with their original structure (including stop-

words), the overall performance of pipeline P3 is

slightly poor, showing a 2% decrease in F-score

compared to P1. The analysis indicates that while

stopwords are often considered as ’noise’ or irrel-

evant in text processing, their role might be more

complex than traditionally assumed. They can

provide essential context or grammatical structure

that aids in the understanding of sentences, espe-

cially in speciﬁc domains.

3. Research Question: How Do the Classiﬁcation

Pipelines Perform Using Balanced and Unbal-

anced Training?

Result Analysis: The plot of Pipeline 4 (P4) in

Figure 2 describes the results of using an unbal-

anced training scenario, i.e., where the classes

have different numbers of instances. In P4, the

hypothesis was that balanced training, with equal

representation of classes, would enhance the clas-

siﬁcation F-score across different domains. Con-

trary to this assumption, the results reveal that the

unbalanced training approach consistently outper-

formed the balanced one in all examined domains.

This led to a noticeable increase in F-score across

all classes. Compared to the baseline pipeline

(P1), which utilized balanced datasets, the over-

all performance of pipeline P4 demonstrated a

signiﬁcant improvement, averaging around a 3%

higher F-score. These ﬁndings suggest a potential

paradigm shift in dataset preparation for the task

of classifying domain entities into top-level on-

tology concepts, underscoring the importance of

aligning training data with the natural class distri-

bution.

4. Research Question: How Do the Classiﬁcation

Pipelines Perform with Other Top-Level On-

tology Concepts?

Result Analysis: Considering the performance

of various classiﬁcation pipelines across top-level

ontology concepts, it is evident that Pipeline P4,

which utilizes unbalanced training datasets, con-

sistently outperforms the others. P4 demonstrates

the highest F-scores in all categories, including

Independent Continuant, Speciﬁcally Dependent

Continuant, Generically Dependent Continuant,

and Process. On the other hand, pipeline P2,

which relies solely on informal deﬁnitions, gener-

ally shows the weakest performance, highlighting

the importance of including speciﬁc entity terms

in training. The Pipeline P3, focusing on stop-

words removal, offers mixed results with some

improvements in certain areas but does not consis-

tently outperform the baseline pipeline P1. Over-

all, the effectiveness of each pipeline varies signif-

icantly with the ontology concept, suggesting that

unbalanced training datasets are more universally

effective in this context.

5. Research Question: How Do the Classiﬁcation

Pipelines Perform in a Cross-Domain Classiﬁ-

cation Scenario?

Result Analysis: In examining the cross-domain

classiﬁcation performance of various pipelines, a

detailed analysis of the speciﬁc F-score values for

each class within domains reveals insightful con-

trasts. For Pipeline P4, high F-scores are ob-

served in D2 for Independent Continuant (99%)

and in D11 for Speciﬁcally Dependent Contin-

uant (96%), contrasting with signiﬁcantly lower

scores in D8 for Independent Continuant (14%)

and Generically Dependent Continuant (88%).

Pipeline P3 shows effectiveness in D5 (Indepen-

dent Continuant at 94%, Speciﬁcally Dependent

Continuant at 77%), yet experiences a dip in D2

(Speciﬁcally Dependent Continuant at 0%) and

D7 (Independent Continuant at 25%). Conversely,

Pipeline P2 demonstrates strong results in D4 for

Independent Continuant (96%) and in D11 for

Speciﬁcally Dependent Continuant (95%), but un-

derperforms in domains such as D9 (Independent

Continuant at 23%) and D2 (Speciﬁcally Depen-

dent Continuant at 0%). These speciﬁc values

highlight the nuanced and variable performance of

each pipeline across different domains, emphasiz-

ing the complexity and necessitating tailored ap-

proaches in the cross-domain classiﬁcation task.

5 CONCLUSION

The study focuses on classifying domain entities into

top-level ontology concepts of the Basic Formal On-

tology (BFO) using a proposed pipeline with the

BERT language model in a cross-domain scenario.

In this context, we examined the impact of several

modiﬁcations of this pipeline, considering the input,

the processing, and the training stages. The exper-

iments were conducted on a novel dataset derived

from OBO Foundry, involving 81 domain ontologies

across 12 knowledge domains, revealing that combin-

ing terms with informal deﬁnitions typically yields

better performance than using only informal deﬁni-

tions. In addition, removing stop words does not

Cross-Domain Classiﬁcation of Domain Entities into Top-Level Ontology Concepts Using BERT: A Study Case on the BFO Domain

Ontologies

147

signiﬁcantly enhance performance, while unbalanced

training demonstrates superior results compared to

balanced training, underscoring the importance of

natural class distribution in training datasets. This

research contributes to understanding the automated

classiﬁcation of domain entities and has implications

for developing more effective ontology-based classi-

ﬁcation systems. For future work, we aim to explore

using other advanced language models in the classiﬁ-

cation pipeline, like Llama and Mixtral.

ACKNOWLEDGMENTS

Research supported by Higher Education Person-

nel Improvement Coordination (CAPES), code 0001,

Brazilian National Council for Scientiﬁc and Techno-

logical Development (CNPq), and Petrobras.

REFERENCES

Arp, R., Smith, B., and Spear, A. D. (2015). Building on-

tologies with basic formal ontology. Mit Press.

Buttigieg, P. L., Morrison, N., Smith, B., Mungall, C. J.,

Lewis, S. E., and Consortium, E. (2013). The en-

vironment ontology: contextualising biological and

biomedical entities. Journal of biomedical semantics,

4:1–9.

Degtyarenko, K., De Matos, P., Ennis, M., Hastings, J.,

Zbinden, M., McNaught, A., Alc

antara, R., Dar-

sow, M., Guedj, M., and Ashburner, M. (2007).

Chebi: a database and ontology for chemical enti-

ties of biological interest. Nucleic acids research,

36(suppl

1):D344–D350.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2018). Bert: Pre-training of deep bidirectional trans-

formers for language understanding. arXiv preprint

arXiv:1810.04805.

Francis, W. N. (1965). A standard corpus of edited present-

day american english. College English, 26(4):267–

273.

Gangemi, A., Navigli, R., and Velardi, P. (2003). The on-

towordnet project: Extension and axiomatization of

conceptual relations in wordnet. In Meersman, R.,

Tari, Z., and Schmidt, D. C., editors, On The Move

to Meaningful Internet Systems 2003: CoopIS, DOA,

and ODBASE, pages 820–838, Berlin, Heidelberg.

Springer Berlin Heidelberg.

Jackson, R., Matentzoglu, N., Overton, J. A., Vita, R., Bal-

hoff, J. P., Buttigieg, P. L., Carbon, S., Courtot, M.,

Diehl, A. D., Dooley, D. M., et al. (2021). Obo

foundry in 2021: operationalizing open data princi-

ples to evaluate ontologies. Database, 2021.

Jaiswal, P., Avraham, S., Ilic, K., Kellogg, E. A., McCouch,

S., Pujar, A., Reiser, L., Rhee, S. Y., Sachs, M. M.,

Schaeffer, M., et al. (2005). Plant ontology (po): a

controlled vocabulary of plant structures and growth

stages. Comparative and functional genomics, 6(7-

8):388–397.

Jiang, A. Q., Sablayrolles, A., Roux, A., Mensch, A.,

Savary, B., Bamford, C., Chaplot, D. S., Casas, D.

d. l., Hanna, E. B., Bressand, F., et al. (2024). Mixtral

of experts. arXiv preprint arXiv:2401.04088.

Jullien, M., Valentino, M., and Freitas, A. (2022). Do

transformers encode a foundational ontology? prob-

ing abstract classes in natural language. arXiv preprint

arXiv:2201.10262.

Lopes, A., Carbonera, J., Schmidt, D., and Abel, M. (2022).

Predicting the top-level ontological concepts of do-

main entities using word embeddings, informal deﬁ-

nitions, and deep learning. Expert Systems with Ap-

plications, 203:117291.

Lopes, A., Carbonera, J., Schmidt, D., Garcia, L., Ro-

drigues, F., and Abel, M. (2023). Using terms

and informal deﬁnitions to classify domain entities

into top-level ontology concepts: An approach based

on language models. Knowledge-Based Systems,

265:110385.

Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., and

Chadha, A. (2024). A systematic survey of prompt

engineering in large language models: Techniques and

applications. arXiv preprint arXiv:2402.07927.

Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W.,

Ceusters, W., Goldberg, L. J., Eilbeck, K., Ireland, A.,

Mungall, C. J., et al. (2007). The obo foundry: coor-

dinated evolution of ontologies to support biomedical

data integration. Nature biotechnology, 25(11):1251–

1255.

Studer, R., Benjamins, V. R., and Fensel, D. (1998). Knowl-

edge engineering: Principles and methods. Data &

knowledge engineering, 25(1-2):161–197.

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi,

A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava,

P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C.,

Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu,

J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal,

N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H.,

Kardas, M., Kerkez, V., Khabsa, M., Kloumann, I.,

Korenev, A., Koura, P. S., Lachaux, M.-A., Lavril, T.,

Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X.,

Mihaylov, T., Mishra, P., Molybog, I., Nie, Y., Poul-

ton, A., Reizenstein, J., Rungta, R., Saladi, K., Schel-

ten, A., Silva, R., Smith, E. M., Subramanian, R.,

Tan, X. E., Tang, B., Taylor, R., Williams, A., Kuan,

J. X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan, A.,

Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R.,

Edunov, S., and Scialom, T. (2023). Llama 2: Open

foundation and ﬁne-tuned chat models.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

148