Enabling Interactive Process Analysis

with Process Mining and Visual Analytics

P. M. Dixit

1,2

, H. S. Garcia Caballero

1,2

, A. Corv

1,2

, B. F. A. Hompes

1,2

J. C. A. M. Buijs

and W. M. P. van der Aalst

Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands

Philips Research, Eindhoven, The Netherlands

Keywords:

Process Mining, Visual Analytics, Compliance Analysis, Performance Analysis, Classiﬁcation.

Abstract:

In a typical healthcare setting, speciﬁc clinical care pathways can be deﬁned by the hospitals. Process mining

provides a way of analyzing the care pathways by analyzing the event data extracted from the hospital infor-

mation systems. Process mining can be used to optimize the overall care pathway, and gain interesting insights

into the actual execution of the process, as well as to compare the expectations versus the reality. In this paper,

a generic novel tool called InterPretA, is introduced which builds upon pre-existing process mining and visual

analytics techniques to enable the user to perform such process oriented analysis. InterPretA contains a set of

options to provide high level conformance analysis of a process from different perspectives. Furthermore, In-

terPretA enables detailed investigative analysis by letting the user interactively analyze, visualize and explore

the execution of the processes from the data perspective.

1 INTRODUCTION

Business process management systems and other

“process aware” information systems (CRM, ERP,

etc.) are gaining popularity across many domains,

also in the healthcare sector (Dwivedi et al., 2001). In

a healthcare setting, the processes, generally referred

to as care pathways, are used to map the ﬂow of pa-

tients in order to enable efﬁcient operations in a hospi-

tal along with standardizing the treatment plan across

patients (Sutherland and van den Heuvel, 2006; Weigl

et al., 2012). However, patients in a hospital may have

different proﬁles and hence, different care pathways.

Furthermore, the scope of care pathway can cover a

broad spectrum from medical guidelines to patient lo-

gistics. Thereby, the notion of processes is highly so-

phisticated and ﬂexible in a healthcare setting. In or-

der to standerdize the overall process, make the steps

evident for all the members of the treatment plan, and

to improve the overall efﬁciency, there is a strong em-

phasis to model and make the steps of the processes

explicit and better managed. Alternatively, these steps

may be inherently known by the participants of the

process, and may not be formally documented any-

where. In such cases only the primary users may be

aware of the important steps in the process. Due to the

lack of documentation, the standerdization and op-

timization of the process could become challenging.

Moreover, if the resources, e.g. nurses, get replaced,

then the new resources may be unaware of the logical

ﬂow of the process. The Hospital Information Sys-

tems (HIS) are designed to record all the information,

such as treatment plan, the interactions with patients

and medical staff, exams and treatment procedures,

logistics, medical decisions etc., that takes place in a

hospital (Graeber, 1997). This information could be

effectively used to analyze the process performance

and gain insights into the process with the intention

of optimizing the care pathways.

Process mining acts as a key enabler for analy-

sis of process based systems using event logs (Aalst,

2016). Event logs can be extracted from the HIS of

the hospital. On a broad level, process mining can be

categorized into process discovery and conformance

analysis. As the name suggests, process discovery

techniques aim at discovering process models auto-

matically using the information from the event log

gathered from the HIS. Conformance techniques, use

a pre-deﬁned process model and map it to the corre-

sponding event log, to determine how well the process

is described according to the event log.

If the event logs extracted from the HIS are con-

sidered to be accurate depiction of the reality, and

if the process model deﬁnes the ideal process ﬂow,

M. Dixit P., S. Garcia Caballero H., CorvÃš A., F. A. Hompes B., C. A. M. Buijs J. and M. P. van der Aalst W.

Enabling Interactive Process Analysis with Process Mining and Visual Analytics.

DOI: 10.5220/0006272605730584

Figure 1: Conformance analysis view of a real world process model (used in the case study). The sheer size of the process

model makes it difﬁcult to draw inferences by merely looking at the whole process in a holistic manner, generating the need

for smart interactive analysis. Figure used for illustrative purposes only, to represent the scale and nature of a large healthcare

process.

then the analysis performed by the conformance tech-

niques can provide valuable insights about what went

right and what went wrong in the process. Confor-

mance analysis can be used to gain insights about the

compliance aspect of the process, such as understand-

ing where the deviations occur in a process (Adri-

ansyah et al., 2011; Munoz-Gama et al., 2014). As

there is a time notion associated with every step in a

process, the conformance analysis could also be used

in order to investigate the performance aspect of the

process, for example, understanding where the bottle-

necks occur in a process. Root cause analysis could be

performed using traditional classiﬁcation techniques

to understand the reasons behind the deviations and

bottlenecks in a process.

Although conformance results enable compliance

and performance analysis, there are still some open

challenges which could make the analysis difﬁcult.

This is speciﬁcally true for healthcare processes,

which are extremely ﬂexible and dependent on mul-

tiple factors thereby making them very complex. A

process discovery on an event log from HIS usually

results in complicated and/or huge process model as

shown in Figure 1. Most of the current visualiza-

tions display the conformance results in their entirety.

From a user perspective, displaying the conformance

results on a big and complicated process model with

many nodes, may turn out be too daunting. More-

over, most of the current techniques are standalone

techniques and do not support effective combinations

of intra-disciplinary analysis. Furthermore, investi-

gation of the satisfaction of speciﬁc protocols and/or

KPIs in the process is usually only possible through

data speciﬁc analysis. That is, it is not possible to

interactively perform various compliance and perfor-

mance analysis directly on a process model.

In order to address the issues discussed above, we

propose a tool - InterPretA (Interactive Process An-

alytics) which serves as a one stop solution for per-

forming process-centric analytics using pre-existing

conformance analysis techniques. InterPretA has

been implemented using Process Analytics

plug-in

available in the nightly builds of the Process Mining

http://svn.win.tue.nl/repos/prom/Packages/

ProcessAnalytics

framework - ProM

(Dongen et al., 2005). As op-

posed to traditional approaches, which usually pro-

vide metrics represented by some numbers which may

be difﬁcult for the user to comprehend, in our ap-

proach the user can interactively explore the perfor-

mance and compliance speciﬁc issues guided by vi-

sual analytic techniques. Firstly, InterPretA enables

the user to get different helicopter views on the pro-

cess. That is, the results from conformance analysis

could be visualized from different perspectives on the

complete process. Next, InterPretA lets the user inter-

actively explore the results from conformance analy-

sis for certain fragments/paths in the process. This

is especially useful for large process models, wherein

the user may want to focus on analyzing only a part

of the process. Finally, InterPretA also supports per-

forming the root cause analysis, explaining why the

deviations or bottlenecks occur in the process. At all

times, the user can also visualize and quantify the in-

formation with the help of some graphs, which could

be used for exploratory purposes or to answer some

questions or for making reports.

2 BACKGROUND

The applicability of visual analytics in process min-

ing has been explored in (Kriglstein et al., 2016; Aalst

et al., 2011; Mannhardt et al., 2015). However, it is

still in its nascent stages and most of the techniques

focus on a broad perspective. The process model

(along with the results from conformance analysis) is

central to the analysis in InterPretA. In this section,

we introduce the background of the elements which

constitute to the backbone of InterPretA.

2.1 Event Log

The HIS, such as electronic health records systems,

could be used in order to extract the event logs. Events

form the basic building blocks of an event log. Ev-

ery event is associated with a particular case, which

is identiﬁed by a case ID. An event represents occur-

rence of an activity in a particular case. Events have a

http://www.promtools.org

Figure 2: A simple carepathway process, modeled using

Petri net. After Admission, Diagnosis is performed, fol-

lowing which Treatment1 is performed in parallel with ei-

ther Treatment2 or Treatment3. Finally the patient exits the

carepathway as denoted by Release activity.

transaction type, such that for every activity instance,

there may be a schedule event, a start event, a sus-

pend event, a resume event and ﬁnally, a complete

event. Each event has a time-stamp associated with

it, which denotes the time of occurrence of a par-

ticular event. Furthermore, each event may contain

additional event speciﬁc data attributes, for example

the resource which performed the particular activity

represented by the event. Similarly, each case may

have case level data attributes, such as the gender of

the patient, age of the patient, etc. The event log is

thus a collection of a sequence of events (so-called

traces) that represent the individual cases. Event logs

extracted from the HIS can be used in a process min-

ing context for performing process discovery; or con-

formance analysis on a pre-existing or a discovered

process model.

2.2 Petri Nets

Multiple notations exist to represent process models,

for example, BPMN, UML diagrams, EPCs, Petri

nets. InterPretA uses Petri nets as a way to repre-

sent process models. The selection of Petri nets was

inspired by the virtue of the properties supported by

Petri nets which enable detailed conformance and per-

formance analysis. Petri nets support modeling of the

traditional business process concepts, such as concur-

rency, choices and sequences (Aalst, 2016). Figure 2

shows an example of a Petri net model, where places

(circles) are used for modeling logic (i.e. sequence,

choice, concurrency, loops etc.) and the rectangles

represent the activities (tasks) of the process.

2.3 Alignments

In our approach, we use alignments-based confor-

mance analysis as proposed in (Adriansyah et al.,

2011) for guiding the compliance and performance

analysis in the context of processes. As discussed

above, conformance analysis helps in determining

how well a process model ﬁts the reality represented

by the event log. This information can be beneﬁcial

Table 1: Example of conformance alignment moves using

Figure 2. Step 1,2,3 and 5 are synchronous moves. Step 4

and step 6 are move on model and move on log respectively.

Trace in event log a b c d e >>

Possible run of model a b c >> e f

Steps 1 2 3 4 5 6

when determining any compliance issues related to ei-

ther the complete process or some fragments of the

process. Furthermore, this information could also be

used in order to determine any performance related

problems and analysis. In this sub-section, we brieﬂy

discuss the idea behind the alignment strategy that is

used for determining the conformance of a model and

event log as proposed in (Adriansyah et al., 2011).

Often times, the event log may contain noisy and/or

incomplete data. Alignments provide a handy way to

deal with such data. Hence, instead of relying com-

pletely on the event log, we use the information from

alignment based conformance analysis as the ground

truth. As alignments relate events in the event log

to model elements, they are ideal for the process-

oriented analysis approach supported in InterPretA.

Aligning events belonging to a trace with a process

model can result in three types of so called moves -

synchronous move, move on model and move on log.

An example alignment is shown in Table 1.

• Synchronous move: Occurrence of an event be-

longing to a trace can be mapped to occurrence of

an enabled activity in the process model.

• Move on model: Occurrence of an enabled activ-

ity in the process model cannot be mapped to the

current event in the trace sequence.

• Move on log: Occurrence of an event in the trace

cannot be mapped to any enabled activity in the

process model.

Optimal alignments provide a means to match a trace

in an event log with a corresponding model run. If

for a trace, all the moves are synchronous or invisi-

ble model moves, then that trace can be perfectly re-

played by the model.

2.4 Classiﬁcation

In literature, classiﬁcation techniques (Goedertier

et al., 2007; Buffett and Geng, 2010; Poggi et al.,

2013) have been applied in the ﬁeld of process min-

ing, in order to address multiple problems. (Leoni

et al., 2015) provide a framework for applying classi-

ﬁcation and correlation analysis techniques in process

mining, by using the results from conformance anal-

ysis. Traditionally, classiﬁcation techniques in pro-

cess mining context are used to perform tasks such as

Figure 3: A snapshot of our interface. Section (A) is dedicated to the process views. The user can directly interact with the

process model. The visualizations, in Section (B), are triggered by this user interaction and by the conﬁguration settings in

Section (C).

abstracting event logs, identifying bottlenecks, under-

standing and modeling guards in data-aware process

models, annotating and clustering cohorts by split-

ting event logs into sub logs etc. However, many of

the applications of classiﬁcation techniques in process

mining focus entirely on the data perspective. As dis-

cussed above, in InterPretA, a process model is cen-

tral to everything. Hence, the classiﬁcation task is

also driven from a process perspective, enabled by the

results of the conformance analysis, visualized on the

process model.

We use the traditional classiﬁcation techniques in

order to perform root cause analysis, for example, to

ﬁnd which resource caused a delay in the process or

what caused a deviation in a process. More particu-

larly, we support the use of pre-existing feature selec-

tion techniques, which use data attribute values (either

case level attributes or event level attributes) in order

to classify based on the desired output, for example,

values above or below a certain threshold for perfor-

mance. Furthermore, well-known ranking techniques

are used to determine the rank of each attribute based

on the ability of an attribute to classify based on the

desired output.

Besides using traditional classiﬁcation and rank-

ing techniques, we use the context-aware process per-

formance analysis framework recently proposed in

(Hompes et al., 2016) to classify the event data using

functions that take the process context into account.

For example, preﬁxes of activities in their trace and

structural properties of the model can be used to clas-

sify activities and cases. Additionally, this technique

can help rank attributes and attribute values by means

of statistical analysis, as described in sub-section 2.5.

For example, cases that lead to statistically signiﬁ-

cantly different conformance results can be classiﬁed

together, and features that lead to those classiﬁcations

can be ranked.

In our approach, the user interactively selects the

interesting fragments in the process model and con-

ﬁgures the classiﬁcation parameters. Based on user

selection, the alignments are either recomputed, or

pre-existing alignments are used, for the classiﬁca-

tion task. For performing the traditional classiﬁcation

tasks we use Weka, which inherently supports an ar-

ray of classiﬁcation algorithms (Hall et al., 2009).

2.5 Statistical Performance Analysis

Statistical analysis can be performed after classiﬁca-

tion in order to discover whether observed differences

between different classes are statistically signiﬁcant.

For example, different resources that execute an ac-

tivity might lead to signiﬁcantly different execution

times, cases for patients of a speciﬁc age group might

take signiﬁcantly longer than those for other patients,

waiting times might be shorter when an activity is pre-

ceded by a certain other activity, etc. When different

classes do not lead to statistically signiﬁcant differ-

ent values for the chosen performance metric, classes

can be grouped in order to reduce the selection di-

mensionality for the user. Well-known statistical tests

such as analysis of variance are used here.

3 PROCESS ANALYTICS

In this section, we discuss the main application anal-

ysis enabled by InterPretA. As mentioned above, we

focus primarily on the analysis from a process per-

spective, as compared to analyzing the data from a

data perspective only. That is, a process model is cen-

tral to the analysis enabled by our technique, and the

results from conformance and performance analysis

are the primary enablers of the data analysis.

Firstly, we discuss the general analysis that can be

performed from a process perspective. Traditionally,

process based analysis can be used to identify what

is functioning correctly in the process and to analyze

where the potential problems lie in the executions of

process in reality. Process oriented analysis can be

broadly categorized into:

• Compliance analysis: The focus of compliance

analysis is to investigate questions pertaining to

auditing, adherence to business rules or protocols

etc. It should be noted that InterPretA provides

easy support for detecting generic compliance is-

sues that might exist in the process. For more so-

phisticated compliance analysis, we refer to spe-

ciﬁc techniques by (Ramezani Taghiabadi et al.,

2013; Knuplesch et al., 2013).

• Performance analysis: Performance analysis is

used to analyze the performance aspect of the

process execution. Particularly, the performance

based analysis are used to explore issues such as

identifying any bottlenecks in the process and the

steps needed to optimize the process. The need

for optimization of process is discussed in, and

motivated from (Mahaffey, 2004).

InterPretA enables both compliance oriented and

performance oriented analysis of the process. Ev-

idently, the conformance and performance analysis

are rather closely tied; and not actually independent

from each other. Firstly, InterPretA supports heli-

copter views on the process, which provide a high

level overview of the behavior of the process based on

the conformance analysis. Secondly and more impor-

tantly, InterPretA allows the user to interactively ex-

plore and perform detailed analysis of the process. In

the subsections to follow, we discuss the types of anal-

ysis enabled by InterPretA’s interface, and the conﬁg-

uration options available that correspond to each type

of analysis. We begin with the so-called views, fol-

lowed by the interactive analysis component of the

tool.

3.1 Graph Views

The event data are visualized in terms of stacked

area charts and stacked bar charts. This view is

represented in Figure 3B. We chose this representa-

tion because it makes comparisons more natural for

the user and allows rapid discovery of outliers. For

non-classiﬁcation based analysis, the X-axis (domain

axis) describes the time distribution and the Y-axis de-

scribes the frequency of occurrence. The X-axis can

be sorted and conﬁgured primarily based on two view

types: absolute time and relative time. The absolute

time view shows the actual occurrence time of a par-

ticular activity, based on the conformance analysis re-

sults. Furthermore, it is possible to abstract the abso-

lute timescale into categories such as: the day of the

week when the event occurred, or the month of the

year when the event occurred etc. Alternatively, the

user can choose to view the X-axis as a relative time

axis. That is, the graph can be plotted corresponding

to the occurrence of event relative to something. For

example, the user can choose to view the distribution

of events compared to the start of the case (Figure 4),

to which the event belongs, or relative time compared

to the execution of another event in the case (Figure

5).

The X-axis of the graph view represents the con-

ﬁgurable absolute or relative timescales for the user.

The Y-axis, i.e. the frequency axis is also conﬁg-

urable. The user can choose from plotting only syn-

chronous moves, only log moves, or both the syn-

chronous and log moves in the alignment. Viewing

both synchronous and log moves essentially shows

the information from the complete event log. Figures

4 and 5 provide examples of such conﬁguration set-

tings.

The different views on graphs lead to interesting

analysis aspects from the data. For example, one pop-

ular type of analysis enabled by our approach is con-

cept drift analysis. Concept drift analysis allows the

user to identify how a process changes over a period

of time. Considering the absolute timescale, the user

can analyze the changes in event patterns over time.

3.2 Process Views

We support three types of process views that can be

used for interactive analysis by the user (see Fig-

ure 3A). These are explained in the following sub-

sections.

Figure 4: Graph view to show the disribution of selected activities from the process model (not shown here), since the start

of the case. Both the synchrnous and log moves are considered here, as evident from the conﬁguration settings. The color of

each bar in histogram corresponds to every selected activity from the process.

3.2.1 Frequency View

The frequency view allows the user to get an overview

of the actual occurrence frequencies of the overall

process (e.g. Figure 6). The frequencies of occur-

rence of an activity are obtained from the alignments

projected on the process model. The frequency view

encodes the frequency of occurrence of activities in

the shades of blue, as projected on the synchronous

moves in the alignments. Darker blue and lighter

blue colors represent extremely frequent activities and

infrequent activities respectively. The user can eas-

ily identify the common and uncommon activities or

fragments in the process, based on the background

color of the activities in the process model. The fol-

lowing options are available to conﬁgure the view on

frequencies:

• Absolute View: Calculates the frequencies, based

on the absolute numbers. For example, the color

of an activity is determined by the absolute num-

ber of synchronous moves. The activity with

most number of synchronous moves is colored the

darkest. This view is similar to the default view of

the inductive visual miner (Leemans et al., 2014).

• Average Occurrence per Case View: Calculates

the frequencies, based on the average occurrences

per case. The color of the activity is determined

by the number of times the activity has a syn-

chronous move in a case, normalized over the log.

Hence, in case of a loop, if the activity has many

synchronous moves within a trace, for many cases

in the event log, then this activity would be col-

ored the darkest. Similarly, if an activity has no

synchronous moves (i.e. no corresponding event

in the event log), then it would be colored the

lightest.

• Number of Cases: Calculates the frequencies,

based on the number of cases for which the syn-

chronous move occurred. The color of an activity

is determined by the ratio of the number of cases

for which synchronous moves occurred over the

total number of cases in the event log.

3.2.2 Fitness of Model and Log View

The ﬁtness of model and log view is derived based on

the synchronous and model moves, along with the log

moves in the process model. This view represents the

ﬁtness of event log and process model. The activities

are colored in the shades of green. The actual color

of an activity is dependent on the ratio of:

#Synchronous moves

#Model Moves + #Synchronous moves

(1)

Hence, darker shade of green for an activity in a

model imply that the particular activity had more syn-

chronous moves than model moves, i.e. it is described

very well according to the model. As with the fre-

quency view, ﬁtness of model and log is conﬁgurable

to obtain different views. That is, the activities can

be colored based on the ratio of absolute occurrences

of synchronous and model moves, or the ratio of av-

erage occurrences of synchronous and model moves

per case, or the ratio of number of cases for which

synchronous and model move occurred.

3.2.3 Performance View

The frequency and ﬁtness views enable the user to

easily investigate the common and uncommon paths

and deviations in the process. However, the perfor-

mance view allows the user to investigate the stages of

the process where bottlenecks occur, or where there is

room for improvement. In the performance view, the

activities are labeled in the shades of red. The shade

depends on the execution time between the immedi-

ate previous synchronous activity and the current syn-

chronous activity for every case. The darker shades

of red imply that the synchronous versions were ex-

ecuted after a long delay since the completion of the

Figure 5: Graph view to compare the distribution of activities with respect to a selected activity. A single activity from the

process view is chosen by the user as the base activity, followed by a number of activities which should be compared with the

base activity. The histogram shows the time distribution and the number of times the selected activities to be compared occur,

before or after the base activity. The color of each bar in histogram corresponds to each selected activity from the process. It

should be noted that the base activity always occurs at time 0 and is not shown in the histogram view.

Figure 6: An example of frequency view of the process model. The more frequent activities/paths could be easily visualized

in the model. The user can use this information to interactively explore the speciﬁc details pertaining to the frequent tasks.

previous synchronous activities. As with the ﬁtness of

model and log, the user can conﬁgure different views

(absolute, average occurrence per case or number of

cases) on the performance dimension.

It should be noted that the transactional informa-

tion of an activity (e.g. events representing lifecycle

an activity) can also be used for performance anal-

ysis. That is, the lifecycle information of an activ-

ity is useful in detecting the actual performance time

of an activity. However, we do not show this infor-

mation on the helicopter performance view as often

event logs only contain complete events, i.e. only one

event per activity instance. When lifecycle informa-

tion is present, additional performance statistics are

calculated and can be shown.

3.3 Interactive Analysis

The graph views and process views enable analysis

of the process from a high level. In a typical process

analytic setting, the user is also interested in perform-

ing a detailed root cause analysis. In this section,

we describe how InterPretA supports the user in

performing such detailed analysis starting from the

high-level views. Five high-level tasks (Schulz et al.,

2013) have been identiﬁed to perform the interactive

analysis.

Task 1: Deviation Analysis. The ﬁtness of model and

log view provides a good overview of how well the

data and model ﬁt, and where the possible deviations

are located in the process. The next step would be to

investigate what causes the deviations in the process.

For example, suppose that the conformance analysis

indicates that a particular task is sometimes skipped

(move on model) in the process. This could be ex-

plained by the attributes associated with the case, for

example the resources performing the task.

In order to investigate what or who causes

deviations in the process, classiﬁcation analysis

Figure 7: Compliance analysis for resources involved in ac-

tivities. When resource ‘Bob’ is involved in activity ‘A’, the

case ﬁtness to the model is signiﬁcantly lower.

is used. The user can select the desired activity

from the activity drop down list, and conﬁgure the

desired classiﬁcation settings, to classify what or

who causes the synchronous, log and model moves

for the particular activity. Here, we make use of

the context-aware performance analysis framework

proposed in (Hompes et al., 2016) to provide and

rank the statistically signiﬁcant attributes (i.e. the

process context). Many combinations of classiﬁ-

cation and performance/conformance functions are

automatically generated from a collection of contexts.

Then, statistical testing is automatically performed

to verify whether different classes (contexts) lead

to statistically signiﬁcantly different results, e.g. in

Figure 7, we see that Bob has a high variability in

activity A. For those combinations where this is

the case, the speciﬁc classiﬁcation is returned to

the user as it might lead to interesting insights. For

example, Figure 7 shows an example plot of the

ﬁtness of a model, split by the different resources

that can perform activity ‘A’ in the process. From

this example, we might deduce that the ﬁtness of a

case to the model depends highly on which resource

executes activity ‘A’. The statistical analysis tells

us which resource(s) lead to signiﬁcantly different

results.

Task 2: Bottleneck Analysis. The performance view

provides an overview of the time spent between the

activities. For example, it may be the case that a

particular path in a process is much slower than the

other paths. The user might be interested in ﬁnding

out, how a particular path with delays is different

from other paths in the process. In order to ﬁnd this

out, the user can select the activities from the process

Figure 8: Bottleneck analysis for activities preceding a bot-

tleneck activity ‘D’. When activity ‘C’ is not performed be-

fore activity ‘D’, the duration is signiﬁcantly longer.

model which correspond to the path with delays.

Next, the user can perform classiﬁcation analysis,

with the output of classiﬁcation set to be ﬁtting ver-

sus non-ﬁtting cases. The ﬁtting cases in the output

would be the ones which have the synchronous moves

for all the selected activities. Alternatively, the user

might be interested in ﬁnding out all the cases in the

process (or a path in the process) which are executed

within a certain time frame. For such analysis, the

output for classiﬁcation could be a threshold for time

taken, such that the attributes which better deﬁne

the performance (e.g. above or below the threshold

for time taken) would be identiﬁed by the classiﬁ-

cation techniques. Alternatively, we automatically

verify whether different contexts lead to signiﬁcant

differences in performance. For example, Figure 8

shows an example where the duration of an activity

is signiﬁcantly different for different preﬁxes in the

trace. From this example, we might conclude that

activity ‘C’ should be made mandatory.

Task 3: Frequency-oriented Compliance Analysis.

The frequency view on process model gives a good

overview of the frequency distribution of activities in

the overall process. This view is already useful for an-

swering some compliance questions such as whether

or not an activity occurs at least once for every

case. However, the user might also be interested in

interactively investigating some non-trivial KPIs such

as the occurrence of a particular activity triggering

the occurrence of another activity. In order to enable

such analysis, the user can select a set of activities,

and select the appropriate conﬁguration settings,

to check the co-occurrences and the frequencies of

co-occurrence for selected activities. Additionally,

Figure 9: Graph showing the concept drift and changes among different modules in the LUMC dataset. The colors in the

graph correspond to an activity belonging to a particular module. For ﬁrst 7 months, all the activities belonged to one module

type. There is a steep fall in the module usage towards year end and a peak in the module usage in year 2.

the user can view differences in performance for

different classes.

Task 4: Performance-oriented Compliance Anal-

ysis. The performance view allows the user to easily

investigate the overall time between activities and/or

where the bottlenecks may be in the process. How-

ever, there could be some time-critical KPI analysis

that the user might be interested in. For example,

a certain activity ‘B’ should be performed within

x-hours after executing an activity ‘A’. The user can

select the activity ‘A’, and activity ‘B’ (among others

if needed), to visualize the time span of occurrence

distributions of ‘B’ with respect to ‘A’. Different

contexts can be tested for their impact signiﬁcance on

the time between the two activities.

Task 5: Process Fragmentation. The user might

also be interested in exploring certain fragments of

a process. That is, instead of considering the align-

ments on the complete process model, the user might

only be interested in investigating how the process

behaves, corresponding to only a few activities. In

order to achieve this, the user can select the interest-

ing activities, and re-compute the alignments only for

such activities. The re-computed alignment would

consider the moves only for the selected activities,

and all the other moves would either be ignored or

considered model moves with zero cost. Based on

the fragmented process view, and the alignments

based on ﬁltered model, all the previous analysis

could be repeated. An example of such task can

be derived from Figure 5, wherein the user selects

some activities (marked blue). The user may then

re-compute alignments only for the selected activities.

4 CASE STUDY

As a part of evaluating our approach, we perform

analysis on a real-life dataset from a diabetes treat-

ment care process, provided by a Dutch hospital -

Leiden University Medical Center (LUMC). In or-

der to cope with unstructured processes as discussed

in (Fernandez-Llatas et al., 2015), LUMC has pro-

posed and rolled out six specialized modules as a part

of its diabetes treatment plan process. Each module

can be viewed as a separate process, each consisting

of a series of appointments, some of which may be

skipped and some of which may occur in any order.

The HIS used by LUMC records all the relevant infor-

mation about the appointments in each of these mod-

ules. The event log created from the HIS logs spans

over more than 2.5 years, and involves almost 300

cases (patients). The data was anonymized by using

pseudonymised patient ids. Using this information,

we perform both exploratory analysis, and also an-

swer some compliance and performance related ques-

tions. Rather than splitting the log into sub-logs cor-

responding to each module type, we use the complete

event log and use the inductive miner infrequent pro-

cess discovery algorithm (Leemans and Aalst, 2014)

to discover a process model, containing all the mod-

ules together.

4.1 Analyzing Change of Modules

By plotting the data for all the events, of all the mod-

ules, its easy to visualize the patterns of changes in

the types of module over time. The domain axis was

sorted to contain the absolute time span from low-

est to highest, represented in terms of months. From

Figure 9, it can easily be concluded that for the ﬁrst

few months (approximately until 7 months), only one

Figure 10: The top three ranked attribute classiﬁers for the classiﬁcation analysis based on the answers to the question Were

the personal goals met?. The top ranked attribute classiﬁer based on the ranking algorithm was zorgpad (i.e. module type).

module existed, as all the activities within this time

span belong to the same module type. From Figure 9

it can also be seen that this module still persisted over

time, but was gradually replaced by the other mod-

ules. One more interesting pattern that could be ob-

served from Figure 9, is the decrease in the number

of events towards the end of the timeline. A logical

explanation for this pattern is that since the appoint-

ments (events) for modules are usually scheduled in

the future, a more distant future has fewer number of

appointments.

4.2 Classiﬁcation Analysis

At the end of each module, the patients are asked to

ﬁll in a questionnaire, evaluating the patients treat-

ment plan, understanding and satisfaction of the mod-

ule. It is interesting to analyze the outcome of the

treatment, based on the answers in the survey. For one

of such questions, Were the personal goals met?, we

perform classiﬁcation analysis with the Information

gain classiﬁer, and answers to question (Yes, No or

NA) as output. The top three ranked attributes which

best classify the output are shown in Figure 10. It

is however important to note that module overname

is a basic module (to meet diabetic team members)

and hence majority of the patients from such modules

have not reached their goals yet, thereby having the

corresponding value of NA, as shown in Figure 10.

We select the top ranked attribute classiﬁer, found to

be the module type. For the majority of patients, no

value was recorded for this question (value of NA).

However, for the patients who did ﬁll in the survey,

one prominent outcome, as evident from Figure 11

is that for the module Optimalisatie glucoseregulatie,

almost twice the amount of patients did not meet their

expectations fully. This suggests that another module

might have been more suitable or calls for improve-

ment in the module to better manage the patient ex-

pectation.

4.3 Compliance Analysis - Time

Perspective

Next, we focus on one individual module, to perform

some preliminary compliance analysis. The expecta-

tion is to have every appointment completed within a

certain time-frame. We select a particular appoint-

ment type from the chosen module and plot a his-

togram with domain axis showing the time in weeks,

since the beginning of the case. Ideally, the chosen

activity should be completed within 9-10 weeks since

the start of the case. However, as becomes clear from

Figure 12, in reality this activity is mostly completed

before the expected time duration, thereby meeting

the desired KPI for majority of the patients.

5 CONCLUSION AND FUTURE

WORK

In this paper, we introduced a novel tool for enabling

interactive process-oriented data analysis. The tool

builds upon and brings together existing techniques

from process mining, data mining and visual analyt-

ics ﬁeld, to enable interactive process analysis. It

supports exploratory analysis through different heli-

copter views on the process. In contrast to existing

approaches, it is highly interactive, which could be

used to perform root cause analysis for any problems

in the process. The application areas of the tool are

broadly categorized, and the tool was utilized to an-

alyze a real-life dataset. The tool relies on the tradi-

tional ways of representing the data (e.g. histograms)

for process analytics.

Figure 11: Zoomed in version showing responses for Yes

and No, for the top ranked classiﬁer zorgpad (module type)

based on the answer to the question Were the personal goals

met?.

Figure 12: Compliance analysis for completetion of the ac-

tivity.

In the future, we aim to support more interac-

tive data exploration, for example from the plotted

graphs views. Currently the tool is limited to read-

only data plots in terms of histograms/stacked charts.

Drill down and roll up analysis could be introduced,

to support more data-oriented task analysis. Further-

more, the impact on the current process could be vi-

sualized based on the interaction of the user with the

data from the graph view. This could also lead to un-

derstanding the conﬁgurability of the process based

on certain cohorts. Another research direction would

be to introduce more classiﬁcation and/or correlation

strategies. Currently, we consider only one classiﬁer

attribute at a time. One obvious next step would be

to also consider the combined impact of multiple at-

tributes on the output variable for classiﬁcation. Here,

statistical analysis would also be a beneﬁcial addition.

ACKNOWLEDGEMENTS

The authors would like to thank LUMC for provid-

ing the problems and the access to the data. The au-

thors especially thank Drs. Marielle Schroijen and

A.J. (Thomas) Schneider, both from LUMC, for their

efforts, thorough help and timely feedback throughout

the analysis. Furthermore, the authors would like to

thank Philips research and Technical University Eind-

hoven for funding this research.

REFERENCES

Aalst, W. M. P. v. d. (2016). Process Mining - Data Science

in Action, Second Edition. Springer.

Aalst, W. M. P. v. d., Leoni, d. M., and ter Hofstede, A. H.

(2011). Process mining and visual analytics: Breath-

ing life into business process models.

Adriansyah, A., Dongen, B. F. v., and Aalst, W. M. P. v. d.

(2011). Towards robust conformance checking. In

Business Process Management Workshops, volume 66

of Lecture Notes in Business Information Processing,

pages 122–133. Springer Berlin Heidelberg.

Buffett, S. and Geng, L. (2010). Using classiﬁcation meth-

ods to label tasks in process mining. Journal of

Software Maintenance and Evolution: Research and

Practice, 22(6-7):497–517.

Dongen, B. F. v., de Medeiros, A. K. A., Verbeek, H. M. W.,

Weijters, A. J. M. M., and Aalst, W. M. P. v. d. (2005).

The prom framework: A new era in process min-

ing tool support. In International Conference on Ap-

plication and Theory of Petri Nets, pages 444–454.

Springer.

Dwivedi, A., Bali, R., James, A., and Naguib, R. (2001).

Workﬂow management systems: the healthcare tech-

nology of the future? In Engineering in Medicine and

Biology Society, 2001. Proceedings of the 23rd An-

nual International Conference of the IEEE, volume 4,

pages 3887–3890. IEEE.

Fernandez-Llatas, C., Martinez-Millana, A., Martinez-

Romero, A., Bened, J. M., and Traver, V. (2015). Di-

abetes care related process modelling using process

mining techniques. lessons learned in the application

of interactive pattern recognition: coping with the

spaghetti effect. In 2015 37th Annual International

Conference of the IEEE Engineering in Medicine and

Biology Society (EMBC), pages 2127–2130.

Goedertier, S., Martens, D., Baesens, B., Haesen, R., and

Vanthienen, J. (2007). Process mining as ﬁrst-order

classiﬁcation learning on logs with negative events. In

International Conference on Business Process Man-

agement, pages 42–53. Springer.

Graeber, S. (1997). The impact of workﬂow management

systems on the design of hospital information systems.

In Proceedings of the AMIA Annual Fall Symposium,

page 856. American Medical Informatics Association.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann,

P., and Witten, I. H. (2009). The weka data min-

ing software: An update. SIGKDD Explor. Newsl.,

11(1):10–18.

Hompes, B. F. A., Buijs, J. C. A. M., and Aalst, W. M.

P. v. d. (2016). A Generic Framework for Context-

Aware Process Performance Analysis, pages 300–317.

Springer International Publishing, Cham.

Knuplesch, D., Reichert, M., Ly, L. T., Kumar, A., and

Rinderle-Ma, S. (2013). Visual Modeling of Business

Process Compliance Rules with the Support of Multi-

ple Perspectives, pages 106–120. Springer Berlin Hei-

delberg, Berlin, Heidelberg.

Kriglstein, S., Pohl, M., Rinderle-Ma, S., and Stallinger, M.

(2016). Visual Analytics in Process Mining: Classiﬁ-

cation of Process Mining Techniques. In Andrienko,

N. and Sedlmair, M., editors, EuroVis Workshop on

Visual Analytics (EuroVA). The Eurographics Associ-

ation.

Leemans, S. J. J., F. D. and Aalst, W. M. P. v. d. (2014). Dis-

covering block-structured process models from event

logs containing infrequent behaviour. In Business Pro-

cess Management Workshops, pages 66–78. Springer.

Leemans, S., Fahland, D., and Aalst, W. M. P. v. d. (2014).

Process and deviation exploration with inductive vi-

sual miner.

Leoni, M. d., Maggi, F., and Aalst, W. M. P. v. d. (2015). An

alignment-based framework to check the conformance

of declarative process models and to preprocess event-

log data. Information Systems, 47:258 – 277.

Mahaffey, S. (2004). Optimizing patient ﬂow in the enter-

prise. Health management technology, 25(8):34–37.

Mannhardt, F., de Leoni, M., and Reijers, H. (2015). The

multi-perspective process explorer. In Proceedings

of the BPM Demo Session 2015, Co-located with

the 13th International Conference on Business Pro-

cess Management {(BPM} 2015), Innsbruck, Austria,

September 2, 2015, pages 130–134. CEUR Workshop

Proceedings.

Munoz-Gama, J., Carmona, J., and Aalst, W. M. P. v. d.

(2014). Single-entry single-exit decomposed confor-

mance checking. Information Systems, 46.

Poggi, N., Muthusamy, V., Carrera, D., and Khalaf, R.

(2013). Business process mining from e-commerce

web logs. In Business Process Management, pages

65–80. Springer.

Ramezani Taghiabadi, E., Fahland, D., Dongen, B. F. v.,

and Aalst, W. M. P. v. d. (2013). Diagnostic Informa-

tion for Compliance Checking of Temporal Compli-

ance Requirements, pages 304–320. Springer Berlin

Heidelberg, Berlin, Heidelberg.

Schulz, H.-J., Nocke, T., Heitzler, M., and Schumann, H.

(2013). A design space of visualization tasks. IEEE

Transactions on Visualization and Computer Graph-

ics, 19(12):2366–2375.

Sutherland, J. and van den Heuvel, W. (2006). Towards an

intelligent hospital environment: Adaptive workﬂow

in the or of the future. In Proceedings of the 39th An-

nual Hawaii International Conference on System Sci-

ences (HICSS’06), volume 5, pages 100b–100b.

Weigl, M., M

uller, A., Vincent, C., Angerer, P., and Sev-

dalis, N. (2012). The association of workﬂow inter-

ruptions and hospital doctors’ workload: a prospective

observational study. BMJ quality & safety, 21(5):399–

407.