above can be reflected by the existence of several clus-
ters of activities from an event log where the activi-
ties in the same cluster are densely connected and the
activities in different clusters are sparsely connected
(this is also the assumption for our method). For in-
stance, in Figure 1 an event log E contains 22 activi-
ties and a causal activity graph G can be established
by employing the activities from E as vertices and the
casual relations (Hompes et al., 2014) among these
activities as edges. The definitions about casual ac-
tivity graph and casual relations of activities are in-
troduced in detail in Section 3.1. According to Figure
1, the vertices in G can be grouped into three clusters
by considering the edge structure in such a way that
there should be many edges within each cluster and
relatively few edges among the clusters.
With the assumption mentioned above, we put for-
ward a new strategy for solving the problem of com-
plex and inaccurate process models mined from real-
life event logs. The basic idea is to generate the clus-
ters of activities firstly by following the same rule
utilised in the example shown in Figure 1. After-
wards, for each cluster one or several sub-models are
generated where each sub-model only contains the ac-
tivities from its relevant activity cluster. In the exam-
ple from Figure 1, the sub-models for cluster A are
built by using the activities from cluster A. Then, for
a complex and inaccurate sub-model, trace clustering
technique is employed to split it into several simple
and accurate sub-sub-models so that the sub-model
can be well comprehended. Finally, these sub-models
(not including the sub-sub-models) generated are ab-
stracted into high level activities with which a sim-
ple and accurate ultima high level process model is
formed. In this paper the high level process model
together with the sub-models (each sub-model is re-
lated to one high level activity in the high level model
built) are used to show the details of the whole busi-
ness process recorded in event log.
Basically, two major benefits could be acquired
from the strategy proposed above. On one hand, the
original tough problem (deal with the entire model)
met by current trace clustering techniques is trans-
formed into small sub-problems (deal with the sub-
models). Specifically, the raw mined model from
event log may contain too many behaviors which
might be far beyond the abilities of existing trace clus-
tering techniques. However, by distributing the huge
amount of behaviors from the original mined model
to several small sub-models (each sub-model contains
less behaviors but still might be complex and inaccu-
rate) the trace clustering techniques can provide better
results while being applied on these sub-models. On
the other hand, the number of activity relations among
the clusters is kept as small as possible (which means
the relations among the high level activities created
are kept as few as possible). As a result, the quality of
the potential high level process model is optimised to
a large extent because it contains a limited number of
behaviors among its activities.
3 APPROACH DESIGN
In this section, we propose a new approach that
utilises the strategy introduced in Section 2 for solv-
ing the problem of ”spaghetti-like” process models
mined from event logs. In Section 3.1, several impor-
tant basic concepts and notations related to our tech-
nique are discussed. In Section 3.2, the details of our
technique are elaborated.
3.1 Preliminaries
Event logs (van der Aalst, 2011) play the significant
part of data sources for various kinds of process min-
ing techniques. The basic concepts related to event
logs are conveyed by the following definitions.
Definition 1. (Case)
Let C be the set of cases. A case c ∈ C is defined as
a tuple c = (N
c
, Θ
c
), where N
c
= {n
1
, n
2
, . .., n
k
} is
the set of names of case attributes, Θ
c
: N
c
→ A
c
is an
attribute-transition function which maps the name of
an attribute into the value of this attribute, where A
c
is
the set of attribute values for case c.
A case is an instance of a specific business pro-
cess and uniquely identified by case id. Each case
may have several attributes such as trace, originator,
timestamp and cost, etc. As one of the most important
case attributes, the trace of a case is defined as:
Definition 2. (Trace)
Let AT be the set of activities, EV be the set of events
and each event ev ∈ EV is an instance of a particular
activity at ∈ AT . A trace is a sequence of ordered
events from EV .
Definition 3. (Event Log)
An event log is defined as E ⊆ C, for any c
1
,c
2
∈ E
such that c
1
6= c
2
.
Take a simple event log E
1
= [< a,b, c >
15
,<
a,c, b >
15
,< a,b >
3
,< a,c >
5
] for example. This log
contains 38 cases (only the case attribute trace is ex-
hibited) and four kinds of trace
1
. There are totally
1
A trace and a kind of trace are two different concepts.
Each trace belongs to a unique case. A kind of trace con-
tains several traces which have the same sequence of events.
A Graph and Trace Clustering-based Approach for Abstracting Mined Business Process Models
65