Automated Medical Reporting: From Multimodal Inputs to Medical

Reports through Knowledge Graphs

Lientje Maas

, Adriaan Kisjes

, Iman Hashemi

, Floris Heijmans

, Fabiano Dalpiaz

1 a

Sandra Van Dulmen

2 b

and Sjaak Brinkkemper

1 c

Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands

NIVEL (Netherlands Institute for Health Services Research), Utrecht, The Netherlands

Keywords:

Healthcare Workﬂow Management, Electronic Medical Record, Automated Reporting, Dialogue

Interpretation, Knowledge Graphs, Patient Medical Graph.

Abstract:

Care providers generally experience a high workload mainly due to the large amount of time required for ade-

quate documentation. This paper presents our visionary idea of real-time automated medical reporting through

the integration of speech and action recognition technology with knowledge-based summarization of the inter-

action between care provider and patient. We introduce the Patient Medical Graph as a formal representation

of the dialogue and actions during a medical consultation. This knowledge graph represents human anatomical

entities, symptoms, medical observations, diagnoses and treatment plans. The formal representation enables

automated preparation of a consultation report by means of sentence plans to generate natural language. The

architecture and functionality of the Care2Report prototype illustrate our vision of automated reporting of

human communication and activities using knowledge graphs and NLP tools.

1 INTRODUCTION

Care providers (CPs) are required to accurately re-

port patient information. As a primary communica-

tion tool between CPs, medical records are necessary

for good patient care. However, recording and main-

taining patient medical information in the electronic

medical record (EMR) is time-consuming. A more ef-

ﬁcient way of reporting is required to cope with high

workload in healthcare while preserving quality of the

patient data.

To reduce documentation time, the use of speech

recognition in medical reporting has been studied ex-

tensively. Recently, Chiu et al. developed a speech

recognition system for transcription of medical con-

versations, reaching a word accuracy of 81.7% (Chiu

et al., 2017). Most studies focus on dictation for

reporting after a consultation (Ajami, 2016). How-

ever, dictation is only used by 1% of medical staff

in the Netherlands (Luchies et al., 2018). Klann and

Szolovits performed initial work to capture the patient

- CP dialogue with speech recognition and automati-

https://orcid.org/0000-0003-4480-3887

https://orcid.org/0000-0002-1651-7544

https://orcid.org/0000-0002-2977-8911

cally extract clinical meaning (Klann and Szolovits,

2009). Further, the project BabyTalk aimed to au-

tomatically generate textual summaries of temporal

clinical data from physiological signals (Portet et al.,

2009). Automated medical reporting is the visionary

goal of our Care2Report (C2R) research program (see

www.care2report.nl). To achieve this, state-of-the-art

speech and action recognition technology are com-

bined with semantic interpretation of data through

knowledge graphs. This enables automatic prepara-

tion of a consultation report that is checked by the

CP (and, if relevant, the patient) before uploading in

the EMR. Our solution will substantially reduce ad-

ministrative load and improve personal engagement

in healthcare. Note that we do not provide decision

support but solely report consultations.

This paper is organized as follows. The next sec-

tion describes our approach to enable automated med-

ical reporting. Section 3 provides more in-depth in-

formation about the formal representation of events

and situations during medical consultations. Section 4

presents the architecture and functionality of the sys-

tem that is under development. Finally, the status of

our research and outlook is described in Section 5.

Maas, L., Kisjes, A., Hashemi, I., Heijmans, F., Dalpiaz, F., Van Dulmen, S. and Brinkkemper, S.

Automated Medical Reporting: From Multimodal Inputs to Medical Reports through Knowledge Graphs.

DOI: 10.5220/0010261605090514

In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF, pages 509-514

ISBN: 978-989-758-490-9

509

Formal

representation

Reasoning

Patient

EMR

Voice dialogue

recording

Action and

treatment

recording

Measurement

recording

Raw text

preprocessing

Raw video

preprocessing

Raw data

preprocessing

Stage 1 Stage 2

Stage 3 Stage 4

Report

preparation

Figure 1: Flowchart of the process of automated medical reporting.

2 APPROACH

Our main research challenge is to integrate state-

of-the-art multimodal recognition technology with

knowledge representation and reasoning into one soft-

ware platform. Globally, the process consists of four

stages (illustrated in Fig. 1):

1. Transformation of audio, video and sensor data

from medical consultations into text using exist-

ing speech and action recognition technology.

2. Formal representation of situations, measure-

ments and treatments based on multimodal input

combined with semantic technology.

3. Generation of medical reports using conventions

in speciﬁc medical domains.

4. Report completion, checking by CP, and upload-

ing through a generic EMR-interface.

We develop a generic hardware and software plat-

form with non-intrusive recording device with micro-

phone, camera and sensor technology that performs

optimal recognition of situations and actions. Sensor

technology enables wireless connection with health-

care domotics, e.g., a thermometer. Multimodal in-

put is provided: audio, video and sensor modalities.

Speech recognition allows to transform medical dia-

logues to text, action recognition captures examina-

tions and treatments, and sensor data provide results

of medical measurements.

2.1 Multimodal Knowledge Integration

To interpret the raw data recorded during the medi-

cal consultation, we model it as a knowledge graph to

enhance semantic reasoning and querying (Antoniou

et al., 2012). We refer to all interpreted information

from the consultation as consultation knowledge.

Although the interpretation of unconstrained di-

alogue text can be problematic, we are in the for-

tunate circumstance that detailed knowledge about

the context of the utterances is available through

so-called background knowledge. For most medi-

cal consultations, the condition for which the patient

is treated is known and the corresponding medical

guideline is employed for a more accurate interpre-

tation (Peleg, 2013; Sutton and Fox, 2003). This

helps to resolve ambiguity and cope with incomplete

or noisy input. Access to the medical record of the

patient is of similar use. Additionally, we exploit the

large corpus of medical background knowledge that

is available. Medical ontologies (SNOMED, ICD-

10, LOINC) and large medical knowledge graphs

(Drugbank, SIDER, AERS) are utilized to disam-

biguate the text. This is particularly helpful for cases

where knowledge of the patient’s condition is par-

tially known or vague.

To integrate information from the multimodal

sources, the C2R system constructs a so-called medi-

cal consultation timeline to log a medical consultation

(e.g., measurements, diagnosis, treatments). The sit-

uations that stem from the occurrence of events are

stored along with their time range, enabling enhanced

event recognition by using multimodal inputs. For

example, if a CP verbally announces that he or she

is going to listen to a patient’s heart (audio input), it

can be foreseen that a stethoscope will be used (video

input). The integration of inputs will lead to the com-

plete modeled consultation knowledge in a knowledge

graph populated by semantic triples (hsubject, predi-

cate, objecti) (Rohloff et al., 2007), from which a re-

port is generated.

3 PATIENT MEDICAL GRAPH

Medical consultations follow a general structure:

opening, history taking, physical examination, evalu-

ation, treatment recommendations and closing (May-

nard and Heritage, 2005). During history taking and

physical examination the presence of signs and symp-

toms is determined, which are evaluated to determine

a diagnosis and treatment plan. To formally represent

HEALTHINF 2021 - 14th International Conference on Health Informatics

510

head

ears

left

ear

outer

ear

canal

auricle

eardrum

hasSymp

pain

drainage

hearing

loss

patient

body

isPartOf

isPartOfisPartOf

redness

swelling

hasSign

scars

hasSign

intact

isPartOf

7/10

mild

yes

hasValue

observation-1

observation-3

hasValue

diagnosedWith

otitis

externa

hasTreatments

hypercortison

3 drops t.i.d.

treatment-2

treatedWith

hasValue

observation-2

obsSymp

obsValue

obsSymp

obsValue

obsSymp

obsValue

left-ear

duration

4 days

observation-6

hasValue

obsSign

obsValue

yes

observation-7

hasValue

obsValue

obsSign

left-ear

canal

observation-4

obsSign

obsValue

left-

eardrum

observation-5

obsSign

obsValue

PAG

PSG

POG

PDG

PTG

left-

auricle

hasValue

Figure 2: Excerpt of a PMG based on a consultation concerning otitis externa (external ear infection). Note that for explanation

reasons the graph is colored for each of the ﬁve subgraphs.

the collected information, we deﬁne the Patient Med-

ical Graph (PMG) as the knowledge graph of the pa-

tient’s anatomy complemented with evaluated signs

and symptoms with associated diagnosis and treat-

ment plan.

The PMG serves as an internal represen-

tation of the consultation knowledge. An example of

the PMG for a ﬁctitious external ear infection (otitis

externa) consultation is presented in Fig. 2. It con-

sists of ﬁve subgraphs: PMG = PAG ∪ PSG ∪ POG ∪

PDG ∪ PTG. We will now formally deﬁne each sub-

graph (see also Tables 1 and 2) and illustrate with ex-

amples from Fig. 2.

The human anatomy is the starting point of the

Patient Anatomy Graph (PAG), representing all hu-

man anatomical entities. The PAG knowledge graph

is universal for each patient, apart from gender dif-

ferences. Existing ontologies are used as reference,

e.g., the Foundational Model of Anatomy (Rosse and

Mejino Jr, 2003). The PAG is complemented with

the Patient Symptom Graph (PSG), representing signs

and symptoms associated with speciﬁc anatomical en-

tities. Medical guidelines build the PSG by provid-

ing lists of signs and symptoms occurring in speciﬁc

medical domains (Peleg, 2013). The Patient Obser-

vation Graph (POG) assigns values to the signs and

symptoms based on observations during the medical

consultation. The observations connect the values to a

certain sign or symptom (e.g., observation-1 observes

symptom pain with value 7/10), appearing as (green)

triangles in the POG. Additional characteristics are

also in the POG, such as the time of occurrence (e.g.,

observation-1 of pain 7/10 has had duration 4 days).

The PMG can be seen as the instance level (A-Box) of an

ontology. Due to space limitations, we do not discuss the

corresponding T-Box that deﬁnes the entity and relation-

ship types.

Next, the graph is complemented with the diagno-

sis made by the CP in the Patient Diagnosis Graph

(PDG). Based on observations (green), the diagnosis

otitis externa is given (red). Finally, we complement

the graph with the Patient Treatment Graph (PTG)

based on the interpreted treatment plan in the con-

sultation. We consider any treatment in its broadest

sense: not only medication, but also referral to a spe-

cialist or additional tests.

3.1 Populating the PMG

Complementing the PAG ∪ PSG with the POG, PDG

and PTG requires interpretation of the consultation.

Observations from test scenarios indicate that the key

parts in the consultation dialogue are typically uttered

in short standard phrases. We aim to capture the med-

ical dialogue through a library of linguistic patterns

with placeholders. Medical guidelines are the starting

point for identiﬁcation of these patterns. The place-

holders are ﬁlled in using part-of-speech tagging and

dependency parsing in combination with regular ex-

pressions, after which semantic triples are deduced.

A similar method has been successfully used for au-

tomated evaluation of eligibility criteria for clinical

trials (Milian et al., 2015).

3.2 Report Generation

From the populated PMG, a report of the consul-

tation is generated. Medical reports generally con-

tain short and simple sentences, which enhances au-

tomated generation. We are developing a natural

language generation component of our system based

on the NaturalOWL system (Androutsopoulos et al.,

2013), illustrated in Fig. 3. Template sentence plans

Automated Medical Reporting: From Multimodal Inputs to Medical Reports through Knowledge Graphs

511

Table 1: Deﬁnition of sets required to deﬁne the PMG.

Set Description Set Description

P all patients O all medical observations

A all anatomical entities of the human body D all medical diagnoses

S all medical signs and symptoms T all medical treatments

V all possible values to be assigned to s ∈ S

Table 2: Formal deﬁnitions of the subgraphs comprising the PMG.

Graph Vertices Typed edges

PAG A {(a

, a

) | a

, a

∈ A ∧ a

is a direct anatomical subpart of a

}

PSG A ∪ S {(a, s) | a ∈ A ∧ s ∈ S} i.e., all signs and symptoms of a

POG O ∪ S ∪V {(o, s), (o, v), (s, v) | o ∈ O ∧ s ∈ S ∧ v ∈ V } i.e., all observations

PDG P ∪ D {(p, d) | p ∈ P ∧ d ∈ D} i.e., all diagnoses for patient p

PTG P ∪ T {(p, t) | p ∈ P ∧t ∈ T } i.e., all treatment plans for patient p

are speciﬁed for the relevant relations in the PMG. We

will determine the requirements and conventions re-

garding medical reporting to identify information that

is relevant to report and study ﬁltering methods for

report texts.

The sentence plans consist of a sequence of slots

along with information on how to ﬁll those in. These

plans lead to separate sentences, which are aggregated

into longer ones based on rules. In addition, referring

expressions are generated to improve readability. Af-

ter the report is generated, the CP checks it for com-

pleteness and correctness.

4 Care2Report PROTOTYPE

To realize our vision a prototype is under develop-

ment that takes multimodal input and outputs a draft

report. It transforms speech to text, recognizes medi-

cal objects from video, and transforms sensor signals

to measurement data. Formal knowledge representa-

tion based on medical guidelines and sentence com-

position are implemented for a selected domain: med-

ical problems related to the ear. Starting with a small

domain provides the opportunity to study and test our

methods by speciﬁcation of e.g. the PAG ∪ PSG and

it enhances data interpretation due to speciﬁc back-

ground knowledge.

4.1 Architecture

The prototype is based on a microservice architec-

ture (Klock et al., 2017). Splitting large unimodal an-

alyzers (e.g., audio analyzer, video analyzer, and do-

motics analyzer) into smaller microanalyzers solves

interdependency complications while maintaining a

loosely coupled system. Each microanalyzer has a

predeﬁned input and output set, which allows for sim-

ple conﬁgurability and future extensibility. A micro-

analyzer controller controls the analysis process and

ensures that all execution constraints are satisﬁed.

4.2 Input Analysis and Report

Generation

The system contains a database with data structure in

correspondence with the medical consultation time-

line described in Section 2. Triples to populate the

PMG are extracted from dialogue text using linguis-

tic tools. Grammatical annotation of dialogue sen-

tences is used to extract concepts and relations for

triple creation. We envision more rigorous methods

in the future as described in Section 3. Video anal-

ysis is used to identify movement of medical objects

(e.g., a stethoscope) to indicate utilization by the CP.

Healthcare domotics send data from medical mea-

surements to the system via Bluetooth. The relevant

input is added to the (prebuilt) PAG ∪ PSG to form

the complete PMG comprising the modeled consul-

tation knowledge. A report is then generated based

on sentence plans, following the procedure described

in Section 3, which is developed for the ear domain.

The stages of the process are illustrated in Fig. 3 for

the ear infection example.

4.3 Evaluation

We are currently building a large corpus of data in-

cluding recordings of both simulated and real medi-

cal consultations. Corresponding medical reports are

written manually by medical professionals to com-

pare with the automatically generated reports. The

data can be partitioned into a training set and a test

set, enabling training and evaluation of the system.

4.4 Technological Platforms

The front end of the system runs on the Windows

UWP platform and is mainly written in C#. The

HEALTHINF 2021 - 14th International Conference on Health Informatics

512

...



CP:Whatbringsyou

heretoday?



P:Ihavepaininmy

leftear.



CP:Ahsoyourear

hurts.Doyoufeelthat

youalsocanhearless

thanusual?



P:No.

...

hasSymptom

Patient

Head

Right

Ear

Left

Ear

Instance:

Left Ear

Instance:

Pain

Symptom

Hearing

Loss

Drainage

Pain

hasSymptom

Figure 3: Example showing part of a transcription of the CP - patient dialogue (left), the resulting PMG (middle) and the

sentence plan for report generation (right).

back end runs primarily on .NET Core and is writ-

ten in C#. The analyzers are written in Python, using

gRPC for communication between services/modules.

Google Cloud Speech-to-Text service transcribes the

audio and linguistic annotation is handled by Python-

Frog. For video analysis the OpenCV and the YOLO

libraries are used. Medical guidelines are modeled in

PROforma. Prot

e facilitates ontology development

and triples are stored and managed with StarDog.

5 RESEARCH OUTLOOK

So far, we presented our grand vision and the imple-

mentation of our basic ideas in the ﬁrst C2R proto-

type. To reach our proposed objectives, we need to

overcome several challenges i.a. in the development

of a robust architecture that is independent of input

technology, in the semantic interpretation of input that

deviates between hospitals on terminology and pro-

cedures, and in striking a balance between required

expressiveness and computational demands in con-

structing a formal representation of the transcriptions.

Our future research will focus on device integra-

tion for high-quality multimodal recognition (stage 1

in Fig. 1), on methods to build and populate the PMG

(stage 2 in Fig. 1), and on methods to ﬁlter out irrel-

evant information from medical consultations (stage

3 in Fig. 1). Our preliminary research and results

encouraged us that our ambitious goal of fully auto-

mated medical reporting is achievable.

ACKNOWLEDGEMENTS

We thank the students of the software project teams

eli and KettleHawks, Marjan van den Akker,

Lennart Herlaar and Sabine Molenaar for their sup-

port.

REFERENCES

Ajami, S. (2016). Use of speech-to-text technology for

documentation by healthcare providers. The National

Medical Journal of India, 29(3):148–152.

Androutsopoulos, I., Lampouras, G., and Galanis, D.

(2013). Generating natural language descriptions

from OWL ontologies: the NaturalOWL system.

JAIR, 48:671–715.

Antoniou, G., Groth, P., van Harmelen, F., and Hoekstra,

R. (2012). A Semantic Web Primer. The MIT Press,

Cambridge, Massachusetts, 3 edition.

Chiu, C.-C., Tripathi, A., Chou, K., Co, C., Jaitly, N., Jaun-

zeikare, D., Kannan, A., Nguyen, P., Sak, H., Sankar,

A., et al. (2017). Speech recognition for medical con-

versations. arXiv preprint arXiv:1711.07274.

Klann, J. G. and Szolovits, P. (2009). An intelligent listen-

ing framework for capturing encounter notes from a

doctor-patient dialog. BMC Medical Informatics and

Decision Making, 9(1).

Klock, S., van der Werf, J. M., Guelen, J. P., and Jansen, S.

(2017). Workload-based clustering of coherent feature

sets in microservice architectures. In ICSA, pages 11–

20.

Luchies, E., Spruit, M., and Askari, M. (2018). Speech

technology in Dutch health care: A qualitative study.

In BIOSTEC, volume 5, pages 339–348.

Maynard, D. W. and Heritage, J. (2005). Conversation anal-

ysis, doctor–patient interaction and medical commu-

nication. Medical Education, 39(4):428–435.

Milian, K., Hoekstra, R., Bucur, A., ten Teije, A., van

Harmelen, F., and Paulissen, J. (2015). Enhancing

reuse of structured eligibility criteria and supporting

their relaxation. Journal of Biomedical Informatics,

56:205–219.

Peleg, M. (2013). Computer-interpretable clinical guide-

lines: a methodological review. Journal of Biomedical

Informatics, 46(4):744–763.

Portet, F., Reiter, E., Gatt, A., Hunter, J., Sripada, S., Freer,

Y., and Sykes, C. (2009). Automatic generation of

textual summaries from neonatal intensive care data.

AI, 173(7-8):789–816.

Rohloff, K., Dean, M., Emmons, I., Ryder, D., and Sumner,

J. (2007). An evaluation of triple-store technologies

Automated Medical Reporting: From Multimodal Inputs to Medical Reports through Knowledge Graphs

513

for large data stores. In OTM Confederated Interna-

tional Conferences “On the Move to Meaningful In-

ternet Systems”, pages 1105–1114. Springer.

Rosse, C. and Mejino Jr, J. L. (2003). A reference on-

tology for biomedical informatics: the Foundational

Model of Anatomy. Journal of Biomedical Informat-

ics, 36(6):478–500.

Sutton, D. R. and Fox, J. (2003). The syntax and semantics

of the PROforma guideline modeling language. Jour-

nal of the American Medical Informatics Association,

10(5):433–443.

HEALTHINF 2021 - 14th International Conference on Health Informatics

514