semantically (Kamdar and Dumontier, 2015; Kanza
and Frey, 2019; Ruan et al., 2019). A KGs sys-
tematizes data resources and their interrelationships.
The Resource Description Framework (RDF) serves
as a standard for describing this semantically enriched
data (Candan et al., 2001; Rossanez and dos Reis,
2019).
In this article, we address how structuring clinical
data by combining NLP with KG to transform clin-
ical dialogues into semantic representation via RDF
triples. Our proposed methodology evaluates the
clinical relevance of input textual data (transcripted
from audio records in telemedicine). We explore
advanced techniques for clinical data extraction and
summarization, such as the Fine-tuned Generative
Pre-trained Transformer (GPT)-NeoX 20B model. At
the heart of this innovation, we originally designed
and developed a software solution that simplifies clin-
ical documentation and enhances medical decision-
making.
Our experimental evaluation assesses automatic
clinical text classification in identifying relevant clin-
ical texts from the overall transcriptions. Our solu-
tion explored LLMs and few-show learning for this
purpose. In addition, we present our results of RDF
triple extraction from textual data (relevant clinical
texts). We found relevant findings exploring few-shot
prompting for identifying RDF triples.
The remainder of this article is organized as fol-
lows: Section 2 introduces underlying concepts and
presents the related work. Section 3 details our de-
signed methodology. Section presents key aspects of
our original developed software tool for clinical data
documentation. Section 5 describes our experimen-
tal evaluation and presents the achieved results, which
are discussed in Section 6. Section 7 summarizes our
findings and points out directions for future research.
2 BACKGROUND
KG structured human knowledge modeling the rela-
tionships between real-world entities (Ehrlinger and
W
¨
oß, 2016). They use the RDF triple representation
for KG model. Triples, made up of subject, predicate,
and object, constitute the fundamental structure of
KG. This formal computational representation is es-
sential for describing and understanding information
about diseases. In the context of clinical data, proper
data representation and integration is crucial. It al-
lows healthcare practitioners and researchers to visu-
alize interrelationships between concepts and findings
(Auer et al., 2007), correlating their research with oth-
ers. By observing these relationships, new hypothe-
ses can be formulated, advancing domain knowledge
(Rossanez et al., 2020).
In the biomedical field, KG have been gaining
prominence. Recent initiatives propose innovative
approaches for classification and search strategies,
from user interaction to machine learning. For exam-
ple, studies have converted neuroscience information
into RDF format (Lam et al., 2007), whereas others
have developed frameworks that integrate information
from multiple domains (Rossanez et al., 2020).
In medicine, clinical transcriptions play a crucial
role in documenting clinical information. The digi-
tal revolution has intensified this relevance, convert-
ing clinical dialogues into structured data. This trans-
formation enhances evidence-based decisions and im-
proves the continuity of patient care. However, the
medical language, full of jargon and specific ter-
minology, poses challenges to the analysis of these
transcripts (Exner and Nugues, 2012). Large Lan-
guage Models (LLM) have emerged as a response to
these challenges, improving natural language analy-
sis. LLM have been essential in creating KG in the
biomedical field, combining efficiency and precision
(Lam et al., 2007; Exner and Nugues, 2012; Manning
et al., 2014).
The extraction of RDF triples from texts has be-
come a central issue. In existing studies, techniques
such as Semantic Role Labeling (SRL) are applied to
map entities and determine (Exner and Nugues, 2012)
relationships. However, creating KG from scientific
literature poses challenges. The literature has a par-
ticular and diverse writing style characterized by long
sentences, abbreviations, and technical terms. NLP
tools must be adequately trained for this specific lex-
icon. In this sense, the automatic generation of KG
from scientific literature proves challenging.
Building a KG for all diseases is challenging. For
this reason, DEKGB (Sheng et al., 2019) proposed
an efficient and extensible framework to build KG for
specific diseases based on doctors’ knowledge. They
described the process by extending an existing health
KG to include a new disease.
In this work, we originally explore the potential of
LLMs in generating RDF triples from medical con-
sultation transcripts. We present results that highlight
the effectiveness of our approach and establish a novel
findings for the construction of KG in the medical do-
main.
3 METHODOLOGY
This section describes the conceptual methodology,
as illustrated in Figure 1. We detail the conducted
KEOD 2023 - 15th International Conference on Knowledge Engineering and Ontology Development
130