Analyzing Sepsis Treatment Variations in Subpopulations with Process

Mining

F. M. Rademaker, R. H. Bemthuis

, J. Jayasinghe Arachchige

and F. A. Bukhsh

University of Twente, Drienerlolaan 5, 7522 NB, Enschede, The Netherlands

Keywords:

Subpopulation Analysis, Process Mining, Healthcare Processes, Sepsis.

Abstract:

Healthcare processes frequently deviate from established treatment protocols due to unforeseen events and the

complexities of illnesses. Many healthcare procedures do not account for variations in treatment paths across

different diseases and patient subpopulations. Understanding the similarities and differences in treatment

paths for different patient groups can provide valuable insights and potential process enhancements for various

subgroups of concern. For hospitals, understanding various patient populations, such as severe or non-severe

cases, is key for enhancing care paths. In this paper, we aim to compare treatment procedures for different

subpopulations of patients using process mining techniques and identify indicators to improve the care path.

We utilize the process mining for healthcare (PM

HC) methodology to identify variations in treatment paths

among different patient subgroups. We conducted a case study on sepsis, a complex illness with a wealth

of available data, for in-depth analysis. Our ﬁndings indicate that various subpopulations exhibit different

outcomes, offering promising directions for further research.

1 INTRODUCTION

Hospital Information Systems (HISs) contain a

wealth of data on healthcare processes (Mans et al.,

2013). These processes, while partially structured,

frequently involve multiple stakeholders and excep-

tion handling, which can lead to ad hoc decision-

making (Mans et al., 2015). The information stored in

a HIS can reveal valuable insights into how healthcare

processes are actually carried out in practice (Mans

et al., 2013).

This research focuses on sepsis, a life-threatening

condition typically resulting from infections, with a

mortality rate ranging from 20% to 50% (Gyawali

et al., 2019). The elderly are particularly vulnera-

ble to this condition. The mean mortality rate of

hospital-based sepsis is 35%. Approximately 10 out

of 1000 patients are diagnosed with sepsis, and 30%

of them develop Multiple Organ Dysfunction Syn-

dromes (MODS) (Polat et al., 2017). In addition to

the high mortality rate, sepsis has the second-highest

readmission rate, with 18 − 26% of patients returning

to the hospital within 30 days (Mans et al., 2008).

Process mining techniques present methods for

analyzing sepsis data and pinpointing the procedures

https://orcid.org/0000-0003-2791-6070

https://orcid.org/0000-0001-8619-6523

https://orcid.org/0000-0001-5978-2754

involved in sepsis treatment. Despite prior research

demonstrating the effectiveness of process mining in

analyzing sepsis event logs (Hendricks, 2019), to our

knowledge, there has been no exploration of the dif-

ferences in treatment and care pathways for various

subpopulations. While researchers have discovered

how sepsis can impact a patient (Gyawali et al., 2019),

the question of how to learn from best treatment prac-

tices remains to be addressed. As an initial step,

we can investigate subpopulation comparisons, em-

phasizing speciﬁc subgroups to understand best prac-

tices better. Subpopulations based on attributes such

as age (Martin et al., 2006), severity (Mans et al.,

2008), and Systemic Inﬂammatory Response Syn-

drome (SIRS) criteria (Comstedt et al., 2009) have

been demonstrated to be reliable predictors of sepsis.

The exploration of processes within electronic

health record event data provides insights into patient

ﬂows. Ongoing research continues to explore these

processes across various sub-populations (Marazza

et al., 2020). This paper contributes to the aforemen-

tioned research direction by systematically identify-

ing and comparing patient sub-populations. In this

paper, we aim to analyze and contrast treatment pro-

cedures across diverse patient subpopulations using

process mining techniques. Our goal is to identify

key indicators that could signiﬁcantly enhance pa-

tient care path trajectories. To achieve this goal, we

ﬁrst categorize subpopulations by identifying distinct

Rademaker, F., Bemthuis, R., Arachchige, J. and Bukhsh, F.

Analyzing Sepsis Treatment Variations in Subpopulations with Process Mining.

DOI: 10.5220/0012600700003690

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 26th International Conference on Enterprise Information Systems (ICEIS 2024) - Volume 1, pages 85-94

ISBN: 978-989-758-692-7; ISSN: 2184-4992

treatment procedures based on attributes discovered

through literature search and data exploration. We

then apply process mining discovery techniques to

these subpopulations and compare the resulting pro-

cess models to examine the efﬁcacy of care paths. Ac-

knowledging the limitations of solely visual compar-

isons, we supplement our analysis with quantitative

evaluations. We guide our process mining project us-

ing the PM

HC methodology (Pereira et al., 2020)

and select appropriate tools/plug-ins for comparing

the process models. The results of this comparison

can provide insights into best practices for each sub-

population, facilitating the design of more personal-

ized and efﬁcient treatments and thereby improving

the overall quality of care for sepsis patients.

Our contributions are as follows: (1) we intro-

duce an approach that incorporates a well-established

HC methodology for conducting a process min-

ing project, with the added step of subpopulation anal-

ysis; (2) as an evaluation, we perform a case study on

sepsis using the proposed approach and a real-world

dataset, providing valuable insights into optimal care

paths.

The remainder of this paper is structured as fol-

lows. Section 2 discusses the background of this re-

search. Section 3 presents the approach that is fol-

lowed. Section 4 discusses the ﬁndings based on the

case study. Section 5 provides a discussion. Finally,

Section 6 concludes and discusses future work.

2 BACKGROUND AND RELATED

WORK

Early diagnosis and optimal patient care are essential

for the effective management of sepsis (Gyawali et al.,

2019). Researchers have proposed a scoring system

that uses biomarkers to assess the likelihood of devel-

oping sepsis (Samraj et al., 2013). This system can as-

sist in early detection, pinpointing high-risk patients,

and monitoring the disease’s progression. One widely

recognized biomarker is the SIRS criteria, which en-

compasses measurements such as temperature (below

36°C or above 38°C), heart rate (exceeding 90 beats

per minute), respiratory rate (more than 20 breaths per

minute), and white blood cell count (10

/µL) either

below 4 or above 12 (Comstedt et al., 2009).

Numerous academic studies have delved into the

application of process mining in healthcare (Dal-

lagassa et al., 2021; Munoz-Gama et al., 2022),

also with a particular focus on contrasting processes

among varied subpopulations. Research in process

mining has been pivotal in evaluating care paths

for sepsis, notably from bottleneck and performance

viewpoints (Hendricks, 2019). One study (Parting-

ton et al., 2015) analyzed processes across four Aus-

tralian hospitals, comparing service performance and

efﬁciency. The authors devised a uniﬁed process

model encompassing the paths of patients from all

the participating hospitals. While this study provided

valuable insights into healthcare processes, the visual

representation of their comparative analysis was con-

strained. Further research is required to enhance these

visualizations and to correlate observed processes

with health outcomes (Partington et al., 2015). An-

other study (Mans et al., 2008) applied process min-

ing techniques to analyze clinical data of stroke care

across various hospitals and subpopulations. Simi-

larly, another study (Marazza et al., 2020) contrasted

cancer treatment processes across two hospitals em-

ploying process mining techniques.

However, on the whole, there has been a scant fo-

cus on contrasting these processes speciﬁcally within

deﬁned subpopulations. While factors like patient

age, gender, and infection type can inﬂuence the pre-

scribed care path (Quintano Neira et al., 2019), there

is limited research on comparing subpopulations to

identify the most effective care paths. In contrast, our

study illustrates the delineation of sepsis patient sub-

populations and the integration of subsequent analy-

ses into a pre-existing process mining methodology.

We employ a dataset with real-world case data and

adapt the widely adopted PM2 methodology for pro-

cess mining projects (van Eck et al., 2015) to discern

and illustrate treatment variations across subpopula-

tions.

3 APPROACH

The modiﬁed methodology is detailed below and vi-

sualized in Figure 1. Although our approach aligns

with the PM

HC methodology, which is speciﬁcally

designed for the healthcare domain, we have made

some modiﬁcations by introducing stakeholder roles

and a subpopulation selection phase for simpliﬁcation

purposes.

3.1 Research Planning

In the ﬁrst phase, a healthcare process is selected, and

research goals are deﬁned. During this phase, the

scope and metrics to be used for comparing process

models should also be determined. Additionally, one

must select the tools and algorithms for process ex-

ploration and mining.

We selected the sepsis dataset (Mannhardt, 2016)

for comparison using BPMNDiffViz and ProM (van

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

1. Planning

6. Evaluation

2. Extraction 3. Data Processing

5. Mining and Analysis

Initialization

Analysis Iterations

Analytics

Discovery

Conformance

Enhancement

4. Subpopulation

Selection

Event Data

Event Logs

Refined/New research

questions

Research question

Compliance findings

Analytic models

Performance findings

Process models

Stage

Filtering

Clustering

7. Process

Improvements from

Best Practices

Figure 1: Visualization of the approach, based on the PM

HC methodology (Pereira et al., 2020).

Dongen et al., 2005). Section 4.2 explains how we

will use the graph edit distance (using the graph edit

distance metric and conformance checking metrics).

3.2 Extraction

In the extraction phase, the study’s boundaries are fur-

ther determined. It involves selecting the relevant data

and excluding irrelevant information. In this research,

we extracted and retrieved event data related to sep-

sis cases from a hospital dataset (Mannhardt, 2016).

The extracted data underwent cleaning and prepara-

tion. We performed data preparation steps as dis-

cussed by (Mannhardt and Blinde, 2017). We have

limited the scope of the study to real-life data of pa-

tients who were admitted to the hospital’s emergency

room (Hendricks, 2019). Each case is represented by

a trace, which records the patient’s journey through

the hospital. More information about the dataset can

be found in Subsection 4.1.

3.3 Data Processing

This phase encompasses an iterative analysis process

requiring iteration between the third, fourth, and ﬁfth

phases of our methodology. In this phase, the data is

processed by creating visualizations of the processes

(i.e., process discovery). The typical steps involved

in this phase include aggregating events, ﬁltering and

enriching logs, and identifying performance indica-

tors. These steps culminate in the outcomes of the

third, fourth, and ﬁfth phases.

The visualizations created in this phase provide

more insight into the recorded events. We will use

dotted charts and process model visualizations. To do

this, we import the XES ﬁle containing the events logs

into the ProM platform and then ﬁlter the ﬁle into sub-

populations using the “Filter Event Log” and “Filter

Log by Attributes” plugins.

3.4 Subpopulation Selection

The subpopulation identiﬁcation phase is introduced

as an additional step to the PM

HC methodology

for data processing. This step involves using the

“LogVisualiser (LogDialog)” plugin to analyze the

data and conduct literature research to identify rel-

evant attributes (such as age, severity, process dura-

tion, etc.) that can be used to create subpopulations.

As suggested by (Mamaliga, 2013), the data should

be segmented into data cubes based on a combination

of these attributes. A more detailed explanation of

subpopulation selection and analysis can be found in

Subsection 4.1.

3.5 Mining & Analysis

In the ﬁfth phase, process-related data is mined and

analyzed to gain insight into different treatment paths

and care paths. The main objective of this phase is

Analyzing Sepsis Treatment Variations in Subpopulations with Process Mining

to derive insights from the sepsis treatment processes.

Performance analysis is conducted to gain insights,

and the models created are evaluated through confor-

mance analysis.

To further examine the process models, the “In-

ductive Visual Miner” plugin is employed. This tool

helps to analyze the number of resources, such as

individuals, following speciﬁc activities, identifying

relative paths, and locating bottlenecks. Addition-

ally, performance indicators are identiﬁed using the

tool, which also enables performance and confor-

mance analysis. Subsection 4.3 provides an explana-

tion of the tool’s implementation.

3.6 Evaluation

The primary objective of this phase is to gain in-

sights into the processes involved in sepsis treatment.

The numerical values obtained are translated into new

learning perspectives and suggestions for improve-

ment, ultimately leading to conclusive ﬁndings.

In our case study, we evaluated the results ob-

tained from the comparisons made using BPMNDif-

fViz and observations gleaned from the “Inductive Vi-

sual Miner” plugin. We have supported our evaluation

through scientiﬁc literature.

3.7 Improvement & Support

In the ﬁnal phase, the ﬁndings are evaluated, future

implementation plans are developed, and suggestions

for improvements are made. The aim is to provide an

optimal path for future learning guided by best prac-

tices. During this phase, all results are evaluated and

interpreted. However, as this is the ﬁnal phase of the

research, it excludes the execution of the actual im-

plementation plan. For future research, we are in the

process of obtaining a sepsis dataset from hospitals in

the Netherlands. Stakeholders can use the results ob-

tained from this phase as a reference scenario for data

preparation and extraction in subsequent studies.

4 FINDINGS

This section describes the ﬁndings and the results of

execution the steps described in the previous section.

4.1 Division of Subpopulations

As mentioned previously, subpopulations are classi-

ﬁed based on speciﬁc attributes and their relation to

the diagnosis of sepsis, as well as the severity level

that the attribute suggests. The dataset comprises 31

attributes, primarily consisting of blood values and di-

agnoses. The attributes used for categorizing the data

into different subpopulations are age, and the num-

ber of SIRS criteria met (SIRS criteria ≥ 2, which in-

dicates an increased likelihood to be diagnosed with

sepsis (Comstedt et al., 2009)). The division of sub-

populations was based on age, given its role as an im-

portant risk factor in predicting sepsis cases (Li et al.,

2022). Besides the SIRS criteria, the dataset used did

not capture other risk factors. Therefore, age and the

SIRS criteria were considered the most important risk

factors for dividing the subpopulations. The subpop-

ulations are named and summarized in Table 1. The

ﬁrst column lists the subpopulations, while the ﬁrst

row explains the criteria that deﬁne each subpopu-

lation. For instance, the subpopulation that includes

patients aged 65 and below is now labelled as Age

A, and the subpopulation with patients who meet less

than two SIRS criteria is called SIRS A. The nomen-

clature for the remaining subpopulations follows the

same pattern.

Please note that the duration of a treatment process

is not given beforehand and needs to be calculated.

This duration is classiﬁed into two categories: Dura-

tion A, which denotes a treatment process that takes

less than or equal 7 days, and Duration B, which de-

notes a process that takes more than 7 days. The du-

ration is determined by considering the time when an

activity starts or ends, but there is no single unit of

time for all activities, and therefore, the total duration

cannot be assumed. However, the duration can be cal-

culated by ﬁnding the difference between the starting

and ending times of the treatment process.

The recorded patient data has an average age of

70.07. In order to create subpopulations of roughly

equal size, the event log is divided at the ages of 65

and 85, resulting in three subpopulations. The ﬁrst

subpopulation, Age A, includes process traces of pa-

tients who are 65 years old or below. The second sub-

population, Age B, includes patients who are between

65 and 85 years old. The third subpopulation, Age C,

includes patients who are 85 or older.

In the United States, over half of the patients in the

Intensive Care Unit are over 65 years old, and many

suffer from life-threatening sepsis (Starr and Saito,

2014). Therefore, the age of 65 is used as a thresh-

old for the ﬁrst and second subpopulations.

Since the SIRS criteria can only be true or false,

the dataset is divided into two cubes. The ﬁrst sub-

population, SIRS-A, includes patients who meet 0 or

1 SIRS criteria. The second subpopulation, SIRS-B,

includes patients who meet 2 or more SIRS criteria.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

Table 1: Division of subpopulations.

Label Age (years) SIRS criteria ≥ 2 Process duration (days)

Subpopulation A ≤ 65 False ≤ 7

Subpopulation B 65 < age < 85 True > 7

Subpopulation C ≥ 85 n/a n/a

4.2 Comparison Tools

We utilized the BPMNDiffViz tool to compare the pro-

cess models. This tool offers structural matching by

visualizing the differences between graphs and pro-

vides statistics to facilitate difference analysis (Dijk-

man et al., 2011). It computes the minimum graph

edit distance between two processes based on the

number of transformations required to change one

process into another using the event labels of activ-

ity nodes (Ivanov et al., 2015). Among other algo-

rithms, we chose the Tabu Search algorithm due to its

precise results and faster performance (Skobtsov and

Kalenkova, 2019).

We performed conformance checking by compar-

ing the percentage of total traces that perform a spe-

ciﬁc activity within a subpopulation. This analysis

revealed which subpopulation is most likely to follow

a particular activity.

4.3 Comparison of Treatment Processes

The comparison was based on the attribute by which

each subpopulation was segmented. We also con-

ducted a detailed analysis of certain activities within

the process models and compared them for confor-

mance. All the models were created using ProM and

the BPMNDiffViz tools and were saved on an online

data repository

. Next, we will discuss the main ﬁnd-

ings of our analysis.

4.3.1 General Observations

In general, the process models consist of 12 to 16

activities, with most models containing either 14 or

16 activities. In all models, ER Registration and ER

Triage occur at the beginning in parallel. The activ-

ities that describe the patient discharge (Release Ac-

tivities A, B, C, D, E) are typically found at the end

of the treatment event(s). Only the activity Return-

ER occurs after a patient has been discharged in any

form.

Of all patients, 63.8% go through Release-A,

while Release B, C, D, and E combined are followed

by only approximately 5.5% of all patients. This also

this url will be made available upon acceptance

SIRS < 2

SIRS ≥ 2

≤ 7 days

> 7 days

≤ 65

65 < age < 84

≥ 84

100%

200%

300%

400%

500%

600%

77.7%

336.7%

92.2%

491%

273.1%

346%

308.9%

80.2%

346.8%

93.4%

517.9%

282.3%

369.8%

307.6%

Percentage of traces within subpopulation

performing speciﬁed event

CRP

Leucocytes

Figure 2: Conformance checking activity comparison for

CRP and Leucocytes.

implies that the patients not covered by those statis-

tics did not ﬁnish the process, for example, because

they were still in the hospital. The activities CRP (i.e.,

c-reactive protein level checking a blood sample) and

Leucocytes are the most frequently accessed activities

in all processes, often occurring more than once in a

single process.

To compare the process models of subpopulations,

we calculate the Graph Edit Distance (GED). This

metric indicates the transformations required to con-

vert one process model into another. We have pre-

sented the results of these comparisons in Table 2.

Furthermore, we have analyzed the number of traces

following the events related to leucocytes and CRP.

To analyze this, we use a metric called the number of

traces, which is the total number of occurrences of an

event by a subpopulation divided by the total number

of patients in that subpopulation. As some events oc-

cur multiple times within one process, the resulting

percentages may exceed 100%. We have visualized

the values of all processes for CRP and leucocytes in

Figure 2.

Compared to most diseases, patients with sepsis

have higher mortality and readmission rates (Mans

et al., 2008). Therefore, in this study, we focus on

the discharge activity and the readmission of patients

to the ER for different subpopulations. For illustra-

tion purposes, we analyze the process traces follow-

ing Release-A and Return-ER using the metric, num-

Analyzing Sepsis Treatment Variations in Subpopulations with Process Mining

Table 2: GEDs retrieved by comparing process models.

Attribute Subpopulation 1 Subpopulation 2 Number of edits (transformations)

Age ≤ 65 65 < age < 85 72

Age ≤ 65 ≥ 85 42

Age 65 < age < 85 ≥ 85 60

SIRS criteria ≥ 2 < 2 58

Process duration ≤ 7 days > 7 days 98

ber of traces, as described in the previous section. We

also compare the number of traces following Release-

A that eventually lead to readmission to the ER across

different subpopulations. Figure 3 visualizes our re-

sults.

SIRS < 2

SIRS ≥ 2

≤ 7 days

> 7 days

≤ 65

65 < age < 84

≥ 84

20%

40%

60%

80%

100%

120%

14.7%

31.1%

63.5%

23.7%

31.8%

28%

37.1%

70.1%

47.7%

85.9%

63.7%

67.9%

59.2%

38.4%

41.6%

69.8%

36.7%

44.9%

41.1%

Percentage of traces within subpopulation

performing speciﬁed event

Return-ER

Release-A

Percentage of Release-A returning to ER

Figure 3: Conformance checking activity comparison for

Release-A and Return-ER.

4.3.2 Age

The age attribute segments the data into three distinct

subpopulations, each represented by its own model.

Figure 4 includes the model comparing subpopula-

tions Age A (≤ 65)and Age B (65 < age < 85), which

resulted in a GED score of 72.

When comparing processes for individuals aged

≤ 65 to those aged between 65 and 85, a GED of

72 suggests a notable difference between the two pro-

cesses. This suggests that the processes for these two

age groups are considerably different. For individ-

uals aged ≤ 65 compared to those aged ≥ 85, the

GED is 42. This is somewhat counterintuitive, as one

might expect a larger difference between the youngest

and oldest age groups. However, the processes for

these two age groups are more similar than the pre-

vious comparison. The processes for the age groups

65 < age < 85 and ≥ 85 have a GED of 60, indicating

a moderate difference between the two processes.

Figure 5 shows a segment of the process model

for subpopulation Age A, where Leucocytes and CRP

are the most commonly accessed activities. In Age

A, the activity of Leucocytes has been performed in

282.30%, and CRP 273.1% of the time, as indicated

in Figure 2. For Age B, CRP has been performed

346% of the time and Leucocytes 369.8% of the time.

Finally, for Age C, CRP has been performed 308.9%

of the time and Leucocytes 307.6% of the time.

When comparing the three models’ patient dis-

charge strategies, the most signiﬁcant differences are

observed between Age A and Age B. In Release-A,

the highest number of traces following the events are

associated with ‘Age >65 and <85’ for both Returns

to the ER and Overall Return ER events. Age C fol-

lows closely for both ER events. Conversely, Age A

has the lowest rate of return to the ER among the three

age groups.

4.3.3 SIRS Criteria

The GED, resulting from the comparison of two mod-

els, (a) ‘SIRS < 2’ and (b) ‘SIRS ≥2’, is denoted as

SIRS-A and SIRS-B, respectively. The processes for

individuals with SIRS criteria ≥ 2 compared to those

with < 2 have a GED of 58. This suggests a moderate

difference between the processes for these two groups

based on the SIRS criteria.

Upon examining the SIRS-A subpopulation, we

observed that the mean number of included classes is

6, while for SIRS-B, it is 10. Thus, it can be con-

cluded that the processes of patients for whom the

SIRS criteria is higher than 2 include a larger num-

ber of different events overall.

In comparison to other subpopulations, the occur-

rences of CRP and Leucocytes in SIRS-A are lower,

with events occurring in 78.2% and 80.7% of traces,

respectively. However, for ‘SIRS > 2’, CRP events

appear in 336.7% of traces, and Leucocytes events in

346.8%.

For patients in SIRS-B, 70.1% were discharged

through Release-A, of which 41.6% returned to the

ER. In the whole subpopulation, 31.1% returned to

the ER. In both SIRS-A and SIRS-B subpopulations,

all patients returning to the ER had been admitted to

the NC earlier in their treatment.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

Figure 4: Comparison between process models of subpop-

ulations A and B. Red lines highlight events that need re-

moval to transform one model to the other, while black

lines denote identical events. Further comparison models

are available in an online data repository

4.3.4 Process Duration

The comparison between processes with a duration of

≤ 7 days and those with a duration of > 7 days yields

the highest GED of 98. This indicates a signiﬁcant

difference between the processes of these two groups.

It suggests that the duration of the process has a sub-

stantial impact on the process model. However, it is

worth noting that the maximum number of activities

included in the two models differs. Treatments with

a duration of less than or equal to a week include 12

different activities, while longer treatments include all

16 kinds of activities. ER returns are not included

in shorter treatments, implying that no returns occur

within the same week as a patient’s hospital admis-

sion. Notably, the minimum time between admission

and return is 7 days and 11 hours. On average, pa-

tients return after 91 days.

4.3.5 Concluding Remarks

The attribute with the most pronounced impact on

process differences, as indicated by the GED, is ‘Pro-

cess duration’. Processes that last ≤ 7 days are con-

siderably different from those lasting > 7 days. Age

also plays a role in process differences, but the rela-

tionship is not linear. The processes for the youngest

and oldest age groups are more similar than the pro-

cesses for the youngest and middle age groups. The

SIRS criteria also inﬂuence process differences but to

a lesser extent than age and process duration.

In conclusion, this analysis provides insights into

how different attributes inﬂuence process models.

Such ﬁndings can be crucial for tailoring interven-

tions or strategies speciﬁc to subpopulations based on

these attributes.

5 DISCUSSION

While our preliminary research aimed to understand

sepsis and its treatment, the involvement of medi-

cal experts would have enhanced the identiﬁcation of

treatment peculiarities. Collaboration with a hospi-

tal would have enabled a more detailed assessment

of treatment processes, veriﬁcation of our results, and

ensured patient safety. Furthermore, we propose to

enhance the discussion by incorporating interpreta-

tions provided by a medical professional. This has

the potential to increase the depth of the ﬁndings and

their applicability in real-world clinical contexts.

We used data from various sepsis treatment events

within a hospital. However, the one-time assessment

of attributes like age, blood rates, and diagnoses lim-

ited our ability to perform a detailed analysis of these

differences during treatment. Anonymizing attributes

in the dataset, such as patient gender, could have pro-

vided richer data and deeper insights, especially re-

garding processes leading to death due to the high

Analyzing Sepsis Treatment Variations in Subpopulations with Process Mining

Figure 5: Process model abstraction of patients with age ≤ 65, using the Inductive Visual Miner (IvM) plugin. The complete

process model and other models are available online

mortality rate of sepsis. Furthermore, other features

not included in the dataset (e.g., history or genetics)

could also contribute to a more comprehensive result.

In this study, we did not explore the generalizability

of our ﬁndings or their applicability in other hospi-

tals or for other sicknesses. Potential biases associ-

ated with the speciﬁc undisclosed hospital might have

inﬂuenced the data. Nonetheless, our study highlights

the feasibility of comparing process models within a

hospital setting using GED and conformance metrics.

In our study, the metrics employed for compari-

son yield initial insights. Key to these insights is the

utilization of the GED via BPMNDiffViz. The integra-

tion of BPMNDiffViz in GED computations facilitates

visualization and discernment of the inherent struc-

tural variances between process models. This gains

prominence in the context of subpopulation analyses,

enabling a granular juxtaposition of process naviga-

tional patterns across varied groups. Additionally, the

derivation of conformance metrics, anchored on the

frequency of event execution by subpopulations, pro-

vides a lens to evaluate the alignment of these cohorts

with established process models. While these metrics

are useful, their role in comparing models across dif-

ferent subpopulations needs more research.

6 CONCLUSIONS AND FUTURE

RESEARCH

In this study, we aimed to explore the challenge of

contrasting subpopulations within healthcare treat-

ment processes. Our focus was on sepsis, a condition

characterized by a multitude of treatment procedures.

We applied the PM

HC methodology to a case study

using real-world data.

Our investigation focused on the treatment trajec-

tories of patients, taking into account factors such as

age, severity, and SIRS criteria. Our ﬁndings revealed

that distinct treatment processes were required for dif-

ferent age groups. Furthermore, we found that seg-

menting patients into two groups based on a duration

threshold of seven days was beneﬁcial for contrast-

ing subpopulations. A notable correlation was iden-

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

tiﬁed between age group division and SIRS score,

with the middle-aged subpopulation engaging in the

most activities. The transformation from the pro-

cess model for Age A to Age C required only 42

edits. In contrast, patients who met the SIRS-B cri-

teria participated in approximately double the activi-

ties per patient compared to those in SIRS-A. In the

subpopulation exceeding a seven-day duration, activi-

ties related to leukocytes, CRP, return-ER, and patient

discharge were most prevalent. Our results suggest

that treatment processes tailored to patient subpopula-

tions based on age, severity, and SIRS criteria provide

unique and promising insights.

Future studies should conduct an in-depth inves-

tigation of the performance of various subpopula-

tions. This investigation could include both threshold

and time-series analysis. Comparing outcomes across

these subpopulations and benchmarking them against

normative models of other healthcare providers could

provide valuable insights. Furthermore, collaborative

initiatives with hospitals to collect treatment data or

explore challenges in the treatment process could en-

hance our understanding of the implications of this

study. The conformance measures used in this study

also warrant further scrutiny to validate their effec-

tiveness. Lastly, we advocate for additional case stud-

ies on healthcare-related topics that employ compara-

tive subpopulation analysis. The goal of these studies

would be to generalize the implications of our ﬁnd-

ings to other hospitals and healthcare systems.

REFERENCES

Comstedt, P., Storgaard, M., and Lassen, A. T. (2009). The

Systemic Inﬂammatory Response Syndrome (SIRS)

in acutely hospitalised medical patients: a cohort

study. Scandinavian journal of trauma, resuscitation

and emergency medicine, 17:67.

Dallagassa, M. R., dos Santos Garcia, C., Scalabrin, E. E.,

Ioshii, S. O., and Carvalho, D. R. (2021). Oppor-

tunities and challenges for applying process mining

in healthcare: a systematic mapping study. Journal

of Ambient Intelligence and Humanized Computing,

pages 1–18.

Dijkman, R., Dumas, M., van Dongen, B., K

arik, R., and

Mendling, J. (2011). Similarity of business process

models: Metrics and evaluation. Information Systems,

36(2):498–516.

Gyawali, B., Ramakrishna, K., and Dhamoon, A. (2019).

Sepsis: The evolution in deﬁnition, pathophysi-

ology, and management. SAGE Open Medicine,

7:2050312119835043.

Hendricks, R. M. (2019). Process mining of incoming pa-

tients with sepsis. Online Journal of Public Health

Informatics, 11(2):224–36.

Ivanov, S. Y., Kalenkova, A. A., and van der Aalst, W.

M. P. (2015). BPMNDiffViz: a tool for BPMN mod-

els comparison. In Proceedings of the Demo Ses-

sion of the 13th International Conference on Business

Process Management, CEUR Workshop Proceedings,

pages 35–39.

Li, M., Huang, P., Xu, W., Zhou, Z., Xie, Y., Chen, C.,

Jiang, Y., Cui, G., Zhao, Q., and Wang, R. (2022).

Risk factors and a prediction model for sepsis: A mul-

ticenter retrospective study in china. Journal of Inten-

sive Medicine, 2(3):183–188.

Mamaliga, T. (2013). Realizing a process cube allowing

for the comparison of event data. Master’s thesis, TU

Eindhoven.

Mannhardt, F. (2016). UMass sepsis cases - event log.

Mannhardt, F. and Blinde, D. (2017). Analyzing the trajec-

tories of patients with sepsis using process mining. In

RADAR+ EMISA 2017, pages 72–80. CEUR-ws. org.

Mans, R., Schonenberg, H., Leonardi, G., Panzarasa, S.,

Cavallini, A., Quaglini, S., and van der Aalst, W. M. P.

(2008). Process mining techniques: an application to

stroke care. In Studies in Health Technology and In-

formatics, volume 136, pages 573–8.

Mans, R. S., van der Aalst, W. M. P., and Vanwersch, R.

J. B. (2013). Process mining in healthcare: opportu-

nities beyond the ordinary, volume 1326 of BPM re-

ports. BPMcenter. org.

Mans, R. S., van der Aalst, W. M. P., and Vanwersch, R.

J. B. (2015). Process mining in healthcare: evaluat-

ing and exploiting operational healthcare processes.

Springer.

Marazza, F., Bukhsh, F. A., Geerdink, J., Vijlbrief, O.,

Pathak, S., van Keulen, M., and Seifert, C. (2020).

Automatic process comparison for subpopulations:

Application in cancer care. International Journal of

Environmental Research and Public Health, 17(16).

Martin, G. S., Mannino, D. M., and Moss, M. (2006). The

effect of age on the development and outcome of adult

sepsis. Critical Care Medicine, 34(1):15–21.

Munoz-Gama, J., Martin, N., Fernandez-Llatas, C., John-

son, O. A., Sep

ulveda, M., Helm, E., Galvez-Yanjari,

V., Rojas, E., Martinez-Millana, A., Aloini, D., Aman-

tea, I. A., Andrews, R., Arias, M., Beerepoot, I.,

Benevento, E., Burattin, A., Capurro, D., Carmona,

J., Comuzzi, M., Dalmas, B., de la Fuente, R.,

Di Francescomarino, C., Di Ciccio, C., Gatta, R.,

Ghidini, C., Gonzalez-Lopez, F., Ibanez-Sanchez, G.,

Klasky, H. B., Prima Kurniati, A., Lu, X., Mannhardt,

F., Mans, R., Marcos, M., Medeiros de Carvalho,

R., Pegoraro, M., Poon, S. K., Pufahl, L., Reijers,

H. A., Remy, S., Rinderle-Ma, S., Sacchi, L., Seoane,

F., Song, M., Stefanini, A., Sulis, E., Ter Hofst-

ede, A. H. M., Toussaint, P. J., Traver, V., Valero-

Ramon, Z., Weerd, I., van der Aalst, W. M. P., Van-

wersch, R., Weske, M., Wynn, M. T., and Zerbato, F.

(2022). Process mining for healthcare: Characteristics

and challenges. Journal of Biomedical Informatics,

127:103994.

Partington, A., Wynn, M., Suriadi, S., Ouyang, C., and

Karnon, J. (2015). Process mining for clinical pro-

cesses: A comparative analysis of four australian hos-

Analyzing Sepsis Treatment Variations in Subpopulations with Process Mining

pitals. ACM Transactions on Management Informa-

tion Systems, 5(4).

Pereira, G. B., Santos, E. A. P., and Maceno, M. M. C.

(2020). Process mining project methodology in

healthcare: a case study in a tertiary hospital. Network

Modeling Analysis in Health Informatics and Bioin-

formatics, 9.

Polat, G., Ugan, R. A., Cadirci, E., and Halici, Z. (2017).

Sepsis and septic shock: Current treatment strate-

gies and new approaches. The Eurasian journal of

medicine, 49(1):53–58.

Quintano Neira, R. A., Hompes, B. F. A., de Vries, J. G. J.,

Mazza, B. F., Sim

oes de Almeida, S. L., Stretton, E.,

Buijs, J. C. A. M., and Hamacher, S. (2019). Anal-

ysis and optimization of a sepsis clinical pathway us-

ing process mining. In Business Process Management

Workshops, pages 459–470. Springer.

Samraj, R., Zingarelli, B., and Wong, H. (2013). Role of

biomarkers in sepsis care. Shock, 40(5):358–365.

Skobtsov, A. and Kalenkova, A. (2019). Efﬁcient algo-

rithms for ﬁnding differences between process mod-

els. In 2019 Ivannikov Ispras Open Conference (IS-

PRAS), pages 60–66.

Starr, M. and Saito, H. (2014). Sepsis in old age: Re-

view of human and animal studies. Aging and disease,

5(2):126–136.

van Dongen, B. F., de Medeiros, A. K. A., Verbeek, H.

M. W., Weijters, A. J. M. M., and van der Aalst, W.

M. P. (2005). The ProM framework: A new era in

process mining tool support. In Applications and The-

ory of Petri Nets 2005, pages 444–454, Berlin, Hei-

delberg. Springer.

van Eck, M. L., Lu, X., Leemans, S. J. J., and van der

Aalst, W. M. P. (2015). PM

: a process mining project

methodology. In International conference on ad-

vanced information systems engineering, pages 297–

313. Springer.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems