The idea of process mining is to discover, mon-
itor and improve real processes (i.e., not assumed
processes) by extracting knowledge from event logs.
We consider three basic types of process mining (Fig-
ure 1): (1) discovery, (2) conformance, and (3) exten-
sion.
Discovery: Traditionally, process mining has
been focusing on discovery, i.e., deriving informa-
tion about the original process model, the organiza-
tional context, and execution properties from enact-
ment logs. An example of a technique addressing the
control flow perspective is the α-algorithm (van der
Aalst et al., 2004), which constructs a Petri net model
describing the behavior observed in the event log. It is
important to mention that there is no a-priori model,
i.e., based on an event log some model is constructed.
However, process mining is not limited to process
models (i.e., control flow) and recent process mining
techniques are more and more focusing on other per-
spectives, e.g., the organizational perspective, perfor-
mance perspective or the data perspective. For exam-
ple, there are approaches to extract social networks
from event logs and analyze them using social net-
work analysis (van der Aalst et al., 2005). This allows
organizations to monitor how people, groups, or soft-
ware/system components are working together. Also,
there are approaches to visualize performance related
information, e.g. there are approaches which graph-
ically shows the bottlenecks and all kinds of perfor-
mance indicators, e.g., average/variance of the total
flow time or the time spent between two activities.
Conformance: There is an a-priori model. This
model is used to check if reality conforms to the
model. For example, there may be a process model
indicating that purchase orders of more than one mil-
lion Euro require two checks. Another example is the
checking of the so-called “four-eyes” principle. Con-
formance checking may be used to detect deviations,
to locate and explain these deviations, and to measure
the severity of these deviations.
Extension: There is an a-priori model. This
model is extended with a new aspect or perspective,
i.e., the goal is not to check conformance but to en-
rich the model with the data in the event log. An ex-
ample is the extension of a process model with perfor-
mance data, i.e., some a-priori process model is used
on which bottlenecks are projected.
At this point in time there are mature tools such as
the ProM framework(van der Aalst et al., 2007b), fea-
turing an extensive set of analysis techniques which
can be applied to real-life logs while supporting the
whole spectrum depicted in Figure 1.
3 HEALTHCARE PROCESS
In this section, we want to show the applicability of
process mining in healthcare. However, as health-
care processes are characterized by the fact that sev-
eral organizational units can be involved in the treat-
ment process of patients and that these organizational
units often have their own specific IT applications, it
becomes clear that getting data, which is related to
healthcare processes, is not an easy task. In spite of
this, systems used in hospitals need to provide an inte-
grated view on all these IT applications as it needs to
be guaranteed that the hospital gets paid for every ser-
vice delivered to a patient. Consequently, these kind
of systems contain process-related information about
healthcare processes and are therefore an interesting
candidate for providing the data needed for process
mining.
To this end, as case study for showing the ap-
plicability of process mining in health care, we use
raw data collected by the billing system of the AMC
hospital. This raw data contains information about a
group of 627 gynecological oncology patients treated
in 2005 and 2006 and for which all diagnostic and
treatment activities have been recorded. The process
for gynecological oncology patients is supported by
several different departments, e.g. gynecology, radi-
ology and several labs.
For this data set, we have extracted event logs
from the AMC’s databases where each event refers to
a service delivered to a patient. As the data is coming
from a billing system, we have to face the interesting
problem that for each service delivered for a patient
it is only known on which day the service has been
delivered. In other words, we do not have any infor-
mation about the actual timestamps of the start and
completion of the service delivered. Consequently,
the ordering of events which happen on the same day
do not necessarily conform with the order in which
events of that day were executed.
Nevertheless, as the log contains real data about
the services delivered to gynecological oncology pa-
tients it is still an interesting and representative data
set for showing the applicability of process mining
in healthcare as still many techniques can be applied.
Note that the log contains 376 different event names
which indicates that we are dealing with a non-trivial
careflow process.
In the remainder of this section we will focus on
obtaining, in an explorative way, insights into the
gynecological oncology healthcare process. So, we
will only focus on the discovery part of process min-
ing, instead of the conformance and extension part.
Furthermore, obtaining these insights should not be