On the Discovery of Explainable and Accurate Behavioral Models for

Complex Lowly-structured Business Processes

Francesco Folino, Massimo Guarascio and Luigi Pontieri

Institute ICAR, National Research Council (CNR), via P. Bucci 41C, 87036, Rende, CS, Italy

Keywords:

Data Mining, Business Process Intelligence, Trace Clustering, Workﬂow Discovery.

Abstract:

Process discovery (i.e. the automated induction of a behavioral process model from execution logs) is an

important tool for business process analysts/managers, who can exploit the extracted knowledge in key pro-

cess improvement and (re-)design tasks. Unfortunately, when directly applied to the logs of complex and/or

lowly-structured processes, such techniques tend to produce low-quality workﬂow schemas, featuring both

poor readability (“spaghetti-like”) and low ﬁtness (i.e. low ability to reproduce log traces). Trace clustering

methods alleviate this problem, by helping detect different execution scenarios, for which simpler and more

ﬁtting workﬂow schemas can be eventually discovered. However, most of these methods just focus on the

sequence of activities performed in each log trace, without fully exploiting all non-structural data (such as

cases’ data and environmental variables) available in many real logs, which might well help discover more

meaningful (context-related) process variants. In order to overcome these limitations, we propose a two-phase

clustering-based process discovery approach, where the clusters are inherently deﬁned through logical deci-

sion rules over context data, ensuring a satisfactory trade-off is between the readability/explainability of the

discovered clusters, and the behavioral ﬁtness of the workﬂow schemas eventually extracted from them. The

approach has been implemented in a system prototype, which supports the discovery, evaluation and reuse

of such multi-variant process models. Experimental results on a real-life log conﬁrmed the capability of our

approach to achieve compelling performances w.r.t. state-of-the-art clustering ones, in terms of both ﬁtness

and explainability.

1 INTRODUCTION

Process discovery (more precisely, control-ﬂow dis-

covery) techniques (van der Aalst et al., 2003) are

a valuable tool for automatically extracting a behav-

ioral schema for a business process (out of past execu-

tion traces), which can proﬁtably support key process

analysis, (re-)design, and optimization tasks.

Unfortunately, a direct application of such tech-

niques to the logs of lowly-structured processes

(featuring a large variety of behavioral patterns) is

likely to yield low quality (“spaghetti-like”) workﬂow

schemas, exhibiting both low readability and low ﬁt-

ness (Buijs et al., 2012). By contrast, reliable work-

ﬂow schemas would be very important for many real-

life ﬂexible and dynamic process management set-

tings, where little a-priori knowledge is available on

what typical work schemes are followed in reality.

To alleviate such a problem, several trace clustering

approaches (De Weerdt et al., 2013; Bose and van der

Aalst, 2009a; Bose and van der Aalst, 2009b; Song

et al., 2008; Greco et al., 2006) have been proposed in

the literature, which help recognize different homoge-

nous execution scenarios (or “process variants”), each

of which can be effectively described through a sim-

pler and more ﬁtting workﬂow schema.

However, most of these approaches only focus on

structural aspects of the traces (e.g., which activities

were performed, and in what an order), paying no at-

tention at all to all kinds of non-structural data (such

as qualitative/quantitative properties of process cases,

or other context factors characterizing the state of the

execution environment at the moment when the case

was performed),which are often available in most real

logs. And yet, such data may well help the analyst get

insight on the discovered execution scenarios, when-

ever the behavior of the process tends to be correlated

to non-structural context variables.

In fact, it was shown in (Folino et al., 2008; Folino

et al., 2011) that such correlations between non-

structural variables and process behaviors do exist

in some real application scenarios, and it is possi-

206

Folino F., Guarascio M. and Pontieri L..

On the Discovery of Explainable and Accurate Behavioral Models for Complex Lowly-structured Business Processes.

DOI: 10.5220/0005470602060217

In Proceedings of the 17th International Conference on Enterprise Information Systems (ICEIS-2015), pages 206-217

ISBN: 978-989-758-096-3

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

ble to learn a classiﬁcation model for discriminating

among different behavioral classes discovered with a

trace clustering procedure. However, the application

of standard classiﬁer-induction methods, as a post-

processing step, to the results of a purely-structural

trace clustering method is not guaranteed to achieve

good accuracy performances in general — as con-

ﬁrmed by experimental ﬁndings presented in Sec-

tion 6.

To overcome the above limitations we here try

to face a new kind of clustering-oriented process

discovery problem, speciﬁcally tailored to the case

of complex and/or lowly-structured process logs.

The ultimate goal of our approach is to discover a

high-quality multi-variant process model for a given

log, consisting of different workﬂow schemas, one

for each of discovered trace clusters, which en-

sures an optimal trade-off between: (i) the readabil-

ity/interpretability of the discovered clusters and of

their distinguishing features (possibly linked to con-

text factors), on the one hand, and (ii) the behavioral

ﬁtness of the discovered workﬂow schemas (i.e., the

capability to adequately capture the behaviors regis-

tered in the log), on the other hand. As to the for-

mer point, for a certain level of behavioral ﬁtness,

higher readability/interpretability is ensured by mod-

els featuring a lower number of clusters, easily ex-

plainable in terms of (accurate enough) discriminat-

ing rules over non-structural context variables.

In order to obtain clusters inherently correlated to

non-structural data, we focus on a speciﬁc family of

conceptual clustering models, represented as logical

(decision) rules. Technically, the search for such a

model is carried out by adopting a predictive clus-

tering approach (Blockeel and Raedt, 1998), where

context-oriented case variables are called to play as

descriptive attributes, while the target of prediction is

a number of simple structural patterns, capturing ba-

sic intra-trace precedence relationships between pro-

cess activities. As a result, a preliminary set of clus-

ters is found, each associated with a speciﬁc decision

rule representing a speciﬁc setting of context vari-

ables (i.e. a speciﬁc “context” variant). Since there

is no guarantee that the discovered clusters really

correspond to neatly different behavioral schemes, a

greedy iterative restructuring procedure is carried out,

where redundant clusters showing similar behaviors

are merged together, as long as the average ﬁtness of

the associated workﬂow schemas can be increased.

The whole approach has been implemented into

a system prototype, which fully assists the user in

discovering, inspecting and evaluating such multi-

variant process models, and helps reuse them in ad-

vanced analyses and run-time support tasks. In par-

ticular, the system is meant to both help the analyst

inspect and validate the correlations discovered be-

tween non-structural context factors and the structure

of process instances (encoded in the form of logical

clustering rules), as well as to exploit them for the

provision of advanced run-time services, which can

be very useful when enacting ﬂexible and dynamic

processes. In particular, by applying the discovered

classiﬁcation model to a partially unfolded (i.e. not

ﬁnished) process case, it is possible to show the work-

ﬂow schema of the cluster it is estimated to belong to,

as a customized (context-adaptive) process map, de-

scribing how the case may proceed.

Experimental results conﬁrmed its capability to

achieve compelling performances, in terms of both

ﬁtness and explainability, with respect to several

state-of-the-art clustering approaches: algorithm

Actitrac

(De Weerdt et al., 2013); the sequence-

based and alphabet-based versions of the approach

proposed in (Bose and van der Aalst, 2009b), based

on the mapping of log traces onto a space of be-

havioral patterns (namely, tandem repeats and max-

imal repeats, respectively); the approach proposed

in (Bose and van der Aalst, 2009a), which exploits

a k-gram representation of traces; and

DWS

algo-

rithm (Greco et al., 2006), which recursively parti-

tions a log based on a sequential patterns capturing

unexpected behaviors.

The rest of the paper is organized as follows. We

ﬁrst present some related research work in Section 2,

and a few basic concepts in Section 3. The core tech-

nical framework is described in Section 4, while Sec-

tion 5 illustrates our discovery approach and system

prototype. An empirical evaluation of our proposal

on a real-life log is discussed in Section 6, while a

few concluding remarks are drawn in Section 7.

2 RELATED WORK

Several clustering approaches have been proposed in

the literature, which try to exploit different kinds of

information captured in log traces, in order to help

recognize different behavioral classes of process in-

stances automatically.

Some of these solutions leverage sequence-

oriented techniques (Ferreira et al., 2007; Bose and

van der Aalst, 2009a), which compare entire traces

by way of string-oriented distance measures. For in-

stance, the context-based approach deﬁned in (Bose

and van der Aalst, 2009a) relies on the generic edit

distance, embedded within an agglomerative cluster-

ing scheme. A probabilistic approach was proposed

instead in (Ferreira et al., 2007), where a mixture of

OntheDiscoveryofExplainableandAccurateBehavioralModelsforComplexLowly-structuredBusinessProcesses

207

ﬁrst-order Markov models approximating the distri-

bution of log traces (still considered as sequences) is

computed via an EM scheme.

Unfortunately, all of these string-oriented tech-

niques require very expensive computations, which

may make them unsuitable for the analysis of massive

logs. Conversely, higher scalability can be achieved

by resorting to feature-based approaches, such as

those in (Greco et al., 2006; Song et al., 2008; Bose

and van der Aalst, 2009b), which reuse efﬁcient clus-

tering methods deﬁned for vectorial data, after pro-

jecting each trace onto some space of derived features.

In more detail, different kinds of features were

considered in (Song et al., 2008) to this purpose,

as a way to capture the behavior of a trace accord-

ing to different perspectives (activities, transitions,

data, performance values, etc). Each of such fea-

tures is associated with a measure that assigns a nu-

meric value to the feature over any possible trace.

After replacing each trace with a vector storing such

measures, whatever distance-based clustering method

can be reused to partition the log. In particular,

in (Song et al., 2008), the usage of three standard

distance measures (namely, the Euclidean distance,

Hamming distance and Jaccard coefﬁcient) was in-

vestigated, in combination with four alternative clus-

tering schemes (namely, K-means, Quality Thresh-

old Clustering, Agglomerative Hierarchical Cluster-

ing and Self-Organizing Maps).

As a more expressive kind of trace features, it was

proposed in (Bose and van der Aalst, 2009b) to ex-

ploit certain sequential patterns inspired to bioinfor-

matics (including tandem arrays and tandem repeats),

which allow for capturing recurring groups of corre-

lated activities and loop structures. After extracting

a set of frequent patterns of such a form from all the

given log, each trace can be transformed into a vector

storing how many times each of them occurs in the

trace. In order to effectively deal with the presence

of concurrent behavior, a variant (named “alphabet-

based”) of this vector-like encoding was also deﬁned

in (Bose and van der Aalst, 2009b), based on the

very idea of regarding all patterns sharing the same

set of activities (i.e. deﬁned over the same “alpha-

bet”) as just one dimension of the target space. In this

way, two distinct sequential patterns, e.g. the repeats

abdgh and adgbh, will be viewed as a unique feature.

It is worth noticing that, in fact, (Bose and van der

Aalst, 2009b) also explored the possibility to convert

all log traces into the (abstracted) sequences of pat-

terns occurring in them (acting as a sort of typical

sub-processes), and then comparing them by way of

some edit-distance measures.

A top-down recursive clustering scheme was pro-

posed in (Greco et al., 2006), where, at each step, a

workﬂow model is extracted from a set of log traces,

which may be further partitioned (until a maximal

number of clusters is reached) in order to obtain a

collection of more precise models (representing dis-

tinguished execution scenarios). In order to accom-

plish each of these trace clustering tasks, algorithm

k-means is applied to an ad-hoc propositional repre-

sentation of the traces, based on sequential patterns

(named “discriminant rules”) capturing unexpected

behaviors (w.r.t. the workﬂow schema currently as-

sociated with the traces that are to be partitioned).

A ﬁtness-aware trace clustering approach was ﬁ-

nally proposed in (De Weerdt et al., 2013), which tries

to group the traces in a way that a user-speciﬁed level

of ﬁtness for each output process model is achieved.

Differently from our work, all the above cluster-

ing approaches are not concerned with the problem

of discovering a partition of the given traces that is

strongly correlated with context-related factors. Such

a problem was partly addressed in (Folino et al., 2008;

Folino et al., 2011), where a classiﬁcation model

was learnt from a set of trace clusters (discovered

by way of the structural clustering method in (Greco

et al., 2006), in order to possibly discriminate among

them, on the basis of non-structural information as-

sociated with the traces. However, the application

of standard classiﬁer-induction methods, as a post-

processing step, to the results of a purely-structural

trace clustering method is not guaranteed to achieve

good accuracy performances. As mentioned in the

previous section, this is the reason why we prefer to

adopt a predictive clustering approach, focusing only

on partitions of the traces that can be deﬁned in terms

of logical rules over their associated non-structural

(context-oriented) data.

Before concluding this section, let us notice that

the usage of predictive clustering techniques (Bloc-

keel and Raedt, 1998) is not completely novel in a

process mining setting. Indeed, it was originally pro-

posed in (Folino et al., 2012), but with the different

aim of supporting run-time predictions for a case-

oriented performance metrics. In fact, the basic idea

of predictive clustering is that, once discovered a suit-

able clustering, accurate predictions for new instances

can be made by ﬁrst estimating the clusters they are

deemed to belong to.

3 PRELIMINARIES

Log Traces. For each process instance (a.k.a

“case”) we assume that a trace is recorded, storing

the sequence of events happened during its enact-

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

208

ment. Let T be the universe of all (possibly partial)

traces that may appear in any log of the process un-

der analysis. For any trace τ ∈ T , len(τ) is the num-

ber of events in τ, while τ[i] is the i-th event of τ, for

i = 1 .. len(τ), with task(τ[i]) and time(τ[i]) denoting

the task and timestamp of τ[i], respectively. We also

assume that the ﬁrst event of each trace is always as-

sociated with a unique “initial” task (possibly added

artiﬁcially), and its timestamp registers the time when

the corresponding process instance started.

For any trace τ, let context(τ) be a tuple gathering a

series of data about the executioncontext of τ, ranging

from intrinsic data properties to environmental vari-

able characterizing the state of the BPM system when

τ was enacted.

For ease of notation, let A

denote the set of all the

tasks (a.k.a., activities) that may occur in some trace

of T , and context(T ) be the space of context vec-

tors — i.e., A

= ∪

τ∈T

tasks(τ), and context(T ) =

{context(τ) | τ ∈ T }.

Finally, a log L is a ﬁnite subset of T .

Workﬂow Schemas and Behavioral Proﬁles. Var-

ious languages have been proposed in the literature

for specifying the behavior of a business process, in

terms of its composing activities and their mutual de-

pendencies— such as Petri nets (van der Aalst, 1998),

causal nets (van Der Aalst et al., 2011), and heuris-

tics nets (Weijters and van der Aalst, 2003). For the

sake of concreteness we next focus on the language of

heuristics nets (Weijters and Ribeiro, 2011), where a

workﬂow schema is a directed graph where each node

represents a process activity, each edge (x,y) encodes

a dependency of y on x, while each fork (resp., join)

node can be associated with cardinality constraints

over the the edges exiting from (resp., entering) it.

The behavior modeled by a workﬂow schema can

be captured approximately by way of simple pair-

wise relationships between the activities featuring in

it, named (causal) behavioral proﬁles (Weidlich et al.,

2011), which can be computed efﬁciently for many

classes of models.

Let W be a workﬂow schema, and A(W) be its as-

sociated activities. Let ≻

be a “weak order” relation

inferred from W, such that, for any x,y ∈ A(W), it is

y ≻

x iff there is at least a trace admitted byW where

y occurs after x. Then the behavioral proﬁle matrix

of W, denoted by B(W), is a function mapping each

pair (x,y) ∈ A(W) × A(W) to an ordering relation in

{ ,+, k}, as follows: (i) B(W)[x, y] = , iff y≻

and x⊁

y (strict order); (ii) B(W)[x,y] = +, iff

x⊁

y and y⊁

x (exclusiveness); (iii) B(W)[x,y] =k,

iff x≻

y and y≻

x (either interleaving or loop).

Let τ be a trace, over trace universe T , x and y be

two activities in A

, and B be a behavioral proﬁle

matrix. Then we say that τ violates (resp., satisﬁes)

B[x, y], denoted by τ 6⊢ B[x,y] (resp., τ ⊢ B[x,y]), if

the occurrences of x and y in τ infringe (resp., fullﬁll)

the ordering constraint stated in B[x, y]. More speciﬁ-

cally, it is τ 6⊢ B[x, y] iff there exist i, j ∈ {1,..., len(τ)}

such that τ[i] = y, τ[ j] = x, and either (i) B[x,y] = +,

or (ii) B[x, y] = and i < j.

Conceptual Clustering Models. The core assump-

tion under our work is that the behavior of a process

depends on context factors. Hence, in order to predict

the structure of any trace τ, we regard its associated

context properties context(τ) as descriptive attributes.

For the sake of interpretability, we seek a concep-

tual clustering model encoded in terms of decision

rules over context(T ). Let us deﬁne conceptual clus-

tering rule over a trace universe T as a disjunction

of conjunctive boolean formulas, of the form [(A

∈

) ∧ (A

∈ V

) ∧ ... ∧ (A

∈ V

)] ∨ [(A

∈ V

) ∧

... ∧(A

∈ V

)]∨. ..∨ [(A

∈ V

)∧. ..∧(A

∈ V

)],

where n, k

,.. . ,k

∈ N, and, for each i ∈ {1,. .., n}

and j ∈ { 1,..., k

}, A

is a descriptive attribute de-

ﬁned on T ’s instances (i.e. one of the dimensions of

the space context(T )), and V

is a subset of the do-

main of attribute A

For any L ⊆ T and for any such a rule r, let

cov(r,L) be the set of all L’s traces that satisfy r.

A conceptualclustering for L is a pair C = hCS,R i,

such that CS = {c

,...,c

} is a partition of L into n

clusters (for some n ∈ N), and R is a function map-

ping the clusters inCS to mutually-exclusiverules like

those above —note that

i=1

= L and

i=1

0),

while cov(R (c),L) = c for any c in CS. Due to their

generality, such rules can split any subset L

′

of Z into

n clusters — {cov(R (c

),L

′

),.. .,cov(R (c

),L

′

)} is

indeed a partition of L

′

In this work, we propose to discover a special

kind of conceptual clustering model for a given set

of log traces, by resorting to a predictive cluster-

ing approach (Blockeel and Raedt, 1998). In gen-

eral, in a predictive clustering setting, two kinds of

attributes are assumed to be available for each el-

ement z of a given space Z = X × Y of instances:

descriptive attributes and target ones, denoted by

descr(z) ∈ X and targ(z) ∈ Y, respectively. Hence,

the goal is to ﬁnd a partitioning function (similar to

the conceptual clustering models above) that min-

imizes

∑

| ×Var({targ(z) |z ∈ C

}), where vari-

able C

ranges over current clusters, and Var(S) is the

variance of set S. In our setting, the context data as-

sociated with each trace will be used as its descriptive

features, whereas some basic behavioral patterns ex-

OntheDiscoveryofExplainableandAccurateBehavioralModelsforComplexLowly-structuredBusinessProcesses

209

tracted from its structure (i.e. from the sequence of

activities appearing in it) will be used as targets.

4 FORMAL FRAMEWORK

The kind of process model we want to eventually

ﬁnd mixes a collection of workﬂow schemas, asso-

ciated with distinct execution clusters (regarded as

alternative execution scenarios), with a conceptual

trace clustering model (i.e. a collection of decision

rules) that allows for discriminating among the clus-

ters, based on their link to relevant context factors.

Such a model is formally deﬁned next.

Deﬁnition 1 (MVPM Model). Given a workﬂow log

L, a multi-variant process model (short MVPM) for L

is triple M = hCS, W , C i such that:

• CS = {L

,.. .L

} is a set of trace clusters, for

some n ∈ N, deﬁning a partition of L —i.e.,

i=1..n

0 and

i=1..n

= L;

• W is a function mapping each cluster c ∈ CS to a

workﬂow schema, denoted by W (c);

• R is function mapping each cluster c ∈ CS to a

conceptual clustering rule, denoted by R (c), such

that C

= hCS,{R (c)|c ∈ CS}i is a conceptual

clustering for T .

The size of M , denoted by size(M), is the number of

clusters in CS with it — i.e. size(M) = |CS|.

In this way, L’s traces are split into different (be-

haviorally homogenous) clusters, each of which is

discriminated by a speciﬁc clustering rule (boolean

formula over traces’ properties), associated with the

cluster by function R . Each of these clusters, viewed

as a distinguished process variant, is also associated

with a workﬂow schema (via function W ), summa-

rizing how process activities are typically executed in

that cluster.

Problem Statement and Solution Strategy. Con-

ceptually, the induction problem faced in this pa-

per may be stated as the search for a

MVPM

of min-

imum size among those maximizing some confor-

mance measure, quantifying the ability of the model

to describe the behaviors registered in the input log

(or in a different test log used for validation).

Various log conformance metrics have been pro-

posed in the literature (see, e.g., (Alves de Medeiros

et al., 2008; Rozinat and van der Aalst, 2008)) to com-

pare the behavior registered in the log to that modeled

by the schema. Since most of these metrics (usually

deﬁned for Petri net models) rely on a log replay strat-

egy and imply expensive model states’ explorations,

they cannot be integrated in our search for an opti-

mal

MVPM

. Therefore, we next introduce an approx-

imated (but scalable) conformance measure, named

fit

, which simply compares the behavioral proﬁles

of each schema appearing in the

MVPM

with the cor-

responding sub-log it was discovered from. Given a

MVPM

M and a log L, the ﬁtness of M w.r.t. L, denoted

by fit

(M,L), is deﬁned as follows:

fit

(M,L) =

∑

∈M.CS

| × λ(W (L

),L

) (1)

where, for any trace cluster L

and workﬂow

schema W

, λ(W

) =

|A(W

× |{(x, y) ∈ A(W

) ×

A(W

) | ¬∃ τ ∈ L

s.t. τ 6⊢ B(W

)[x,y]}|. Notice that

the function λ quantiﬁes the fraction of W

’s behav-

ioral proﬁles that are not violated by L

’s traces, used

here as a rough (but scalable) ﬁtness score.

To solve such a discovery problem efﬁciently, we

devised a two-phase strategy, consisting of two main

computation tasks: (i) extract a (possibly ﬁne-grain)

conceptual clustering CM for L, by using a simpliﬁed

propositional representation of the traces, where the

structure of each of them (i.e. the sequence of activ-

ities featuring in it) is encoded into a vector of basic

behavioral features; (ii) restructure CM, by merging

together as many clusters as possible, provided that

the actual ﬁtness of the workﬂow schemas associated

with CM’s clusters does not decrease.

Notably, for scalability reasons, the actual quality

(measured through function fit) of the workﬂow as-

sociated with each trace cluster is totally disregarded

in the ﬁrst phase, where the search of a clustering

solution only tries to minimize the expected infor-

mation loss over an approximated ﬂat representation

of the traces, according to a predictive clustering ap-

proach (Blockeel and Raedt, 1998). In particular, to

accomplish the ﬁrst task, we resort to an existing PCT

learning method (CLUS), providedwith an ad-hoc en-

coding of the log, named p-view and described in de-

tails in what follows.

Propositional Trace Encoding (used in Phase I).

Basically, we want to use the context data of the

traces (and possibly the activities occurring in them)

as descriptive features for partitioning the given log

into behavioral clusters, by way of suitable clustering

rules. To concisely represent major behavioralaspects

of the traces, a target variable is deﬁned over each ac-

tivity pair, capture basic precedence relationships.

In more details, let B be the given behav-

ioral proﬁle matrix of some preliminary workﬂow

schema, trying to capture all possible behaviors

of the process analyzed. Then, for each trace

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

210

τ and each pair (a

) of activities, we can de-

ﬁne a target variable v(τ,a

) as follows: (i)

v(τ,a

) =

f(τ,a

)

2×c(τ,a

)

if both a

and a

occur in

τ, where f(τ,a

) =

sum

({

sgn

( j

′

− i

′

) | i

′

, j

′

∈

{1,. .. ,len(τ)}∧ task(τ[i

′

]) = a

∧ task(τ[ j

′

]) = a

})

—with

sgn

denoting the signum function— and

c(τ,a

) = |{ (i

′

, j

′

) | i

′

, j

′

∈ {1,.. .,len(τ)} ∧ i

′

∧ task(τ[i

′

]) = a

∧ task(τ[ j

′

]) = a

}|; and (ii)

v(τ,a

) =

null

if does not contain both a

and a

In this way, v(τ,a

) keeps trace of the mutual po-

sitions of a

and a

, if both occur in τ (case ii); other-

wise (case ii), we just set v(τ,a

) =

null

Based on such structural target variables, we next

formally deﬁne the propositional view (p-view) of a

log, to be eventually used to extract a conceptual clus-

tering model, by way of a PCT learner.

Deﬁnition 2 (Log View). Let L be a log over trace

universe T , and A

= {a

,...,a

} be the associated

activity set. Then, the propositional view (short p-

view) of L, denoted by V (L), is a relation containing

a tuple z

= descr(z

) ⊕ targ(z

) (with ⊕ denoting

tuple concatenation) for each τ ∈ L, such that: (i)

descr(z

) = context(τ) ⊕ TV(τ), where TV(τ) is a

vector in {0, 1}

s.t., for each i ∈ {1,.., k}, TV(τ)[i]=1

iff a

occurs in τ; and (ii) targ(z

) = hv(τ,a

...,v(τ,a

),v(τ,a

),...,v(τ,a

...,v(τ,a

k−1

),..., v(τ,a

k−1

),..., v(τ,a

)i.

Workﬂow Schema Similarity (used in Phase II)

Rather than exploring all possible sequences of pair-

wise merges, over the clusters found in the ﬁrst phase,

we propose to adopt a greedy iterative agglomeration

procedure, where the two clusters exhibiting the most

similar behavioral models are considered for being

possibly merged, at each iteration. In order to efﬁ-

ciently estimate how similar two workﬂow schemas

are to each other, we next introduce an approximated

similarity function, deﬁned as a variant of that pro-

posed in (Kunze et al., 2011).

Deﬁnition 3 (Schema Similarity). Let W

and W

two workﬂow models, A

, A

be their respective activ-

ity sets (i.e. A

= A(W

), and A

= A(W

)), and B

, B

their respective behavioral proﬁles (i.e. B

= B(W

and B

= B(W

)). Let S

) = {(x,y) | (x,y) ⊆

A(W

) ∩ A(W

) ∧ B

[x,y] = h ∧ h ∈ { , +,k} ∧ k ∈

{i, j}} be the ordering relationships of type h that W

and W

share. Then, the BP-similarity between W

and W

, denoted by sim

) : W

× W

→ [0,1]

deﬁned as:

sim

) =

∑

h∈{ ,+,k}

· J (S

)+β

·J (A

)

Input: A log L over trace universe T ,

minimal clusters’ coverage minCov ∈ (N).

Output: A

MVPM

for L;

Method: Perform the following steps:

// Phase I: Build an Initial Trace Clustering Model

1. V := V (L); // compute a p-view for L (cf. Def. 2)

2. hS, Ri :=

minePC

(V, minCov); //S is an L’s partitioning

// and {R(c)|c ∈ S} is a conceptual clustering model

3. for each cluster c ∈ S

4. W (c) :=

mineWFS

(c);

5. end for

// Phase II: Bottom-up Clusters’ Merging

6. Q := {{x,y} |x, y ∈ S} // mergeable clusters’ couples

7. repeat

8. let ( ˆc

, ˆc

) = argmax

∈Q

{sim

(W (c

),W (c

))}

9. c

new

:= ˆc

∪ ˆc

; W

new

mineWFS

new

);

10. Q := Q − { {x,y} | {x,y} ⊆ {ˆc

, ˆc

} }

∪ { {x,c

new

} | x ∈ S− { ˆc

, ˆc

} };

11. if | ˆc

| · λ(W ( ˆc

), ˆc

) + | ˆc

| · λ(W ( ˆc

), ˆc

)

≤ |c

new

| · λ(W

new

) then

12. S := S − { ˆc

, ˆc

} ∪ {c

new

};

13. W (c

new

) := W

new

;

14. R (c

new

) := R ( ˆc

) ∨ R ( ˆc

);

15. end if;

16. until Q =

17. return hS,W , R i

Figure 1: Algorithm

MVPM-mine

where, for any sets X and Y, J (X , Y ) =

|X∩Y|

|X∪Y|

is the

Jaccard coefﬁcient of X and Y, while β

,β

∈ [0,1]

are weights such that

∑

h∈{ ,+,k}

+ β

= 1. 

In this way, the more two workﬂow models overlap

over their activities and behavioral proﬁles, the more

similar they are deemed. Notice that, in the experi-

ments discussed later on, the four components of the

measure above were all weighted uniformly —i.e. we

set β

= β

= 1/4.

5 DISCOVERY APPROACH

5.1 Algorithm MVPM-mine

Figure 1 illustrates an algorithm, named

MVPM-mine

for inducing a

MVPM

out of a given log. As mentioned

previously, the algorithm follows a two-phase ap-

proach: it ﬁrst ﬁnds a preliminary conceptual cluster-

ing model (Steps 1-3) for the traces, and then restruc-

tures the discovered clusters, along with their asso-

ciated clustering rules and workﬂow schemas (Steps

4-18), via a bottom-up iterative aggregation scheme.

The ﬁrst phase simply amounts to encoding the in-

put log traces into a propositional dataset (Step 1) —

where each trace is equipped with both descriptive

variables and summarized behavioral features, as for-

OntheDiscoveryofExplainableandAccurateBehavioralModelsforComplexLowly-structuredBusinessProcesses

211

mally deﬁned in Def. 2— before giving it as input

to function

minePC

(Step 2). This function imple-

ments the PCT-learning method described in (Bloc-

keel and Raedt, 1998), and returns a conceptual clus-

tering model, here denoted as a set C of trace clusters

with their associated clustering rules —for each clus-

ter c, R (c) is the rule that allows for discriminating c

from all other clusters. Essentially, this function relies

on a top-down partitioning scheme, where a split (ex-

pressed in terms of descriptive attributes) that locally

minimizes clusters’ variance (over the target space)

is greedily selected at each step, provided that reduc-

tion of variance is signiﬁcant enough (according to an

F-test) and that the selected cluster contains minCov

traces at least.

Each discovered trace cluster (capturing a context-

dependent process execution variant) is then equipped

with a workﬂow schema (Steps 3-5), by using func-

tion

mineWFS

, which implements the Flexible Heuris-

tics Miner (FHM) method described in (Weijters and

Ribeiro, 2011) —this choice was mainly due to the

scalability and robustness to noise of this method.

It is worth noticing that our approach is paramet-

ric w.r.t. the actual implementation of both functions

mineWFS

and

minePC

, so that other solutions could

be used for implementing them. In particular, a wide

range of workﬂow discovery algorithm exist in the lit-

erature which could be exploited to obtain more ex-

pressive process models (such as Petri nets, or Event

Process Chains). However, such an issue is beyond

the scope of this work, yet deserving deeper investi-

gation in the future.

The second part of the algorithm follows a bottom-

up merging scheme, somewhat resembling that of ag-

glomerative clustering methods. At each step of the

loop (Steps 8-16) a couple of clusters (ˆc

and ˆc

) is

greedily chosen, such that their respective workﬂow

schemas are the two ones sharing the more behav-

ior, among all those schemas associated with current

trace clusters —where behavioral similarity is evalu-

ated here with the approximate measure in Def. 3.

Before merging the selected clusters, however, a

check is performed(Step 11), to assess whether such a

merge really allows for improving the overall confor-

mance measure fit —evaluated on the

MVPM

gathering

all the workﬂow schemas induced from current trace

clusters. Only if the check turns successful, the merge

is conﬁrmed, and all the three components of cur-

rent

MVPM

solution are updated (Steps 12-14). In any

case, the couple of clusters chosen is removed from

Q (of candidate clusters’ pairs), as to avoid consider-

ing it in subsequent iterations —this allows to prune

the search space of the restructuring phase. It is easy

to see that the following property trivially holds, as

concerns the correctness of the check in Step 11.

Property. The conformance of MVPM hCS,W ,R i,

measured as fit(hCS, W ,R i,L), never decreases

during any computation of MVPM-mine. 

It is worth noticing that, in the actual implemen-

tation of the algorithm, Q is maintained in the form

of an ad-hoc collection of priority queues, one for

each cluster in CS, all implemented as heaps. Specif-

ically, for each cluster c ∈ CS, the associated queue

stores any other cluster that may still be merged with

c, using the respective similarity to c as priority value.

Moreover, a dictionary (implemented as a hash table)

is also used to support key-based searches over the

clusters. In order to ensure fast accesses to the con-

tents of each queue, the extraction of an element is

performed by simply marking the corresponding en-

try as “invalid”, without actually removing it from the

queue, unless it occupies the top position. Similarly,

an update to a priority value is carried out through a

“virtual” removal, as explained before, followed by

an insertion. Each time a cluster c is extracted from

(resp., added to) CS, the corresponding queue is de-

stroyed (resp., created), and c is virtually removed

from (resp., added to) the queue of any other clus-

ter. The search for the closest pair of clusters (Step 8)

is eventually accomplished by only accessing the top

elements of these queues, and then selecting the one

with the highest score.

Let n be the number of clusters that were obtained

in the ﬁrst computation phase (Step 3). Since no more

than n − 1 additional clusters can be created in the

second phase, any priority queue may contain O(n)

(real or invalid) elements. Thus, both the initialization

of these queues, and all subsequent accesses to them

can be performed in O(n

× logn).

5.2 System Prototype

Algorithm

MVPM

(cf. Fig. 1) has been fully imple-

mented in a system prototype, in order to apply and

validate the approach in practical cases. The logical

architecture of the system is depicted in Figure 2.

The Learning block is responsible for supporting

the discovery of all kinds of models composing a new

MVPM

. In particular, the Context Data Derivation mod-

ule is devised to enrich the vector of each case’s data

with additional context-related ﬁelds, which may al-

low for better discriminating among the discovered

behavioral clusters. Such additional case attributes

include both statistics computed on existing attributes

– e.g., the

workload

quantifying the the number of

total cases currently assigned to each resource or to

each high-level organizational entity – as well as new

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

212

Testing &

Evaluation

Model Analysis and Reuse

Runtime

Support

Discovered Models

Classification

Models

Workflow

Models

Context

Data

Derivation

Conceptual

Clustering

Cluster

Merging

Learning

Gateway

User Interface

BPM

Platform

Workflow

Mining

Figure 2:

MVPM

plug-in architecture.

properties deﬁned by the analyst —e.g., what re-

source/team was initially allocated to the case.

The traces, enriched with these supplementary at-

tributes, are delivered to the Conceptual Clustering

module, which groups traces sharing both similar

structure and data values, by leveraging some core

functionalities of system CLUS (CLUS), a predic-

tive clustering framework supporting the induction of

PCT models from tabular data. To this end, a propo-

sitional encoding of the traces is built by the module,

mixing both context data and structural features (cf.

Deﬁnition 2).

For each leaf cluster in the PCT, the Workﬂow Min-

ing module extracts a workﬂow schema, and com-

putes the behavioral proﬁles associated with it.

Clusters exhibiting similar behaviors (according to

the similarity function in Deﬁnition 3) are then itera-

tively merged. The merging procedure, performed by

the Cluster Merging module, continues until the con-

formance of the current

MVPM

is increased, in order to

ﬁnd a good balance between the size of the model,

and its ability to effectively describe different behav-

ioral classes. When two clusters are merged, their re-

spective classiﬁcation rules are merged accordingly,

in order to keep the clustering model updated.

For testing purposes and further analysis, both the

workﬂow and the classiﬁcation models are stored into

ad-hoc repositories in the Discovered Models block.

As for the Model Analysis and Reuse block, the

Testing & Evaluation model is mainly meant to assess

the quality of discovered workﬂow schemas and asso-

ciated classiﬁcation models. In particular, the module

supports the computation of different conformance

metrics (described in Section 6), based on replaying

new test traces through the workﬂow schema of the

cluster they are assigned to.

Finally, the Runtime Support module is devoted to

provide advanced run-time services, which can turn

very useful for supporting the enactment of ﬂexible

and dynamic processes. In particular, each discovered

classiﬁcation model can be applied to any partially

unfolded (i.e. not ﬁnished) process case, in order to

assign it to one of the discovered process variants (i.e.

trace clusters). The workﬂow schema associated with

the selected cluster can be then presented to the user,

as a customized (context-adaptive) process map, de-

scribing how the case could proceed (based on how

past cases belonging to the same context variant be-

haved). In addition, this module is capable to dynam-

ically evaluate the degree of compliance between any

new (just ﬁnished or still ongoing) trace and its ref-

erence workﬂow schema, in order to possibly detect

deviating behaviors.

6 EXPERIMENTS

The approach proposed so far has been validated on

the log of a real problem management system (named

VINST), offered by Volvo IT Belgium, as a bench-

mark dataset for the 2013 BPI Challenge (Steeman,

2013). Precisely, we used 1487 traces spanning from

January 2006 to May 2012.

Each log event stores 8 data attributes (namely,

status

substatus

resource

res country

support team

org line

, and

org country

, and

functional division

). Moreover, for each prob-

lem case p, two attributes are associated with p’s

trace: p’s

impact

(medium, low, or high), and the

product

affected by p.

In order to enrich each log trace τ with further con-

text data, we extended it with a series of newattributes

indicating the support team and the country hosting

the solver team appearing in τ’s ﬁrst event (denoted

firstOrg

and

firstCountry

, respectively), and

two other “environmental” variables: a

workload

in-

dicator, storing the total number of problems open in

the system when τ started, and several time dimen-

sions (namely,

week-day

month

and

year

) derived

from the timestamp of τ’s ﬁrst event.

6.1 Evaluation Settings

Several kinds of metrics have been proposed

in the literature to evaluate discovered workﬂow

schemas (Buijs et al., 2012). In particular, ﬁtness met-

rics, quantifying the capability to replay the log, are

typically employed as the main evaluation tool, while

other kinds of metrics (e.g., precision-oriented ones)

OntheDiscoveryofExplainableandAccurateBehavioralModelsforComplexLowly-structuredBusinessProcesses

213

Table 1: Results (avg±stdDev) obtained by

MVPM-mine

and several competitors. The best value in each column is in bold.

Trace Clustering Method Fitness BehPrec #nodes #edges #edgesPerNode

MVPM-mine (with descriptive structural features)

0.865±0.010 0.635±0.009 6.2±0.0 9.3±0.2 2.8±0.1

MVPM-mine (without descriptive structural features)

0.766±0.075 0.612±0.027 8.0±0.0 13.6±0.2 3.4±0.0

ACTITRAC(De Weerdt et al., 2013)

0.317±0.057 0.667±0.017 8.2±0.1 14.1±0.4 3.4±0.1

TRMR(Bose and van der Aalst, 2009b)

0.651±0.024 0.604±0.010 7.8±0.1 13.6±0.2 3.5±0.1

A-TRMR(Bose and van der Aalst, 2009b)

0.764±0.027 0.618±0.011 7.7±0.1 13.3±0.2 3.4±0.1

KGRAM(Bose and van der Aalst, 2009a)

0.558±0.026 0.655±0.007 8.0±0.1 13.2±0.2 3.3±0.0

Table 2: Accuracy of the classiﬁcation models induced from the discovered clusters. Best values per column are in bold.

With descriptive structural features Without descriptive structural features

Trace Clustering Method Generalization error Cramer’s coefﬁcient Generalization error Cramer’s coefﬁcient

MVPM-mine

0.924 0.849 0.823 0.732

ACTITRAC(De Weerdt et al., 2013)

0.721 0.135 0.725 0.124

TRMR(Bose and van der Aalst, 2009b)

0.843 0.575 0.693 0.387

A-TRMR(Bose and van der Aalst, 2009b)

0.883 0.534 0.699 0.297

KGRAM(Bose and van der Aalst, 2009a)

0.789 0.592 0.560 0.330

can support ﬁner grain comparisons among schemas

with similar ﬁtness scores.

In our tests, the ﬁtness of each discovered

heuristics-net schema was speciﬁcally measured via

the Improved Continuous Semantics Fitness deﬁned

in (de Medeiros, 2006). Essentially, the ﬁtness score

of a schemaW w.r.t. a log L (denoted by Fitness(W,L))

is the fraction of L’s events that W can parse exactly,

with a special punishment factor beneﬁtting schemas

yielding fewer replay errors in fewer traces.

The behavioral precision of schema W w.r.t. log

L, denoted by BehPrec(W, L), is the average fraction

of activities that are not enabled when replaying L

through W:

BehPrec(W, L) =

|A(W)| × |traces(L)|

∑

τ∈traces(L)

|{a ∈ A(W)|W did not enable a in τ’s replay}|

We preferred these two rough conformance metrics

to standard ones deﬁned for Petri-net models, since

we experienced long computation times and very low

conformance scores every time we applied the latter

ones to heuristics nets – this was likely due to the

many invisible transitions that tend to be produced

when converting them into Petri nets by using the plu-

gins available in ProM.

We also considered three structural-complexity in-

dicators: the numbers of nodes (#nodes) and of edges

(#edges), and the average number of edges per node

(#edgePerNode).

As competitors, we considered ﬁve state-of-the-art

approaches: algorithm

Actitrac

(De Weerdt et al.,

2013); the sequence-based and alphabet-based ver-

sions of the approach proposed in (Bose and van der

Aalst, 2009b) (denoted by

TRMR

and

A-TRMR

, respec-

tively), which relies on a mapping of traces onto a

space of behavioral patterns (precisely, tandem re-

peats and maximal repeats); the approach proposed in

(Bose and van der Aalst, 2009a) (denoted by

KGRAM

which exploits a k-gram representation of traces, and

DWS

(Greco et al., 2006), which recursively parti-

tions a log based on a special kinds of (discriminat-

ing) sequential patterns capturing unexpected behav-

iors (w.r.t. current workﬂow schemas).

Since some of these methods (namely,

TRMR

KGRAM

and

A-TRMR

) need to be provided with the desired

number of clusters, we instructed them to search

for as many trace clusters as those discovered by

our approach, while letting the other methods (i.e.,

Actitrac

and

DWS

) autonomouslydetermine the right

number of clusters. When running algorithm

DWS

with its default parameter setting, no actual trace par-

titioning was found for the log (i.e., only one cluster

was obtained). Therefore, since our evaluation founds

on assessing the capability to recognize (and discrimi-

nate among) different behavioral clusters, we will dis-

regard

DWS

in the remainder of our analysis.

6.2 Test Results

Table 1 shows the quality results obtained by algo-

rithm

MVPM-mine

, when run according to two alter-

native settings: (i) using, for each log trace, say τ,

the whole set of descriptive features deﬁned in Def. 2,

namely context properties (i.e. context(τ) structural

features (i.e. TV(τ)), or (ii) using only non-structural

features (i.e., context(τ)). For each method and each

evaluation measure, the average over all trace clus-

ters, performed in 10-fold cross-validation, is shown

in the table. In particular, in each clustering test, we

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

214

Figure 3: An execution scenario discovered by algorithm

MVPM-mine

: Workﬂow Schema of Cluster # 2.

Figure 4: An execution scenario discovered by algorithm

MVPM-mine

: Workﬂow Schema of Cluster # 4.

computed an overallFitness (resp., BehPrec) measure

for each method, as the weighted average of the Fit-

ness (resp., BehPrec) scores received by the work-

ﬂow schemas that were induced (with FHM) from

all the trace clusters discovered in the test. Since in

all cross-validation trials

MVPM-mine

always found 6

trace clusters, the same number of clusters was given

as input to the competitors.

It is easy to see that the version of

MVPM-mine

ex-

ploiting both structural and context features neatly

outperforms all competitors in terms of average work-

ﬂow ﬁtness – the primary comparison metrics – with

an outstanding score of 0.865. On the other hand,the

achievementof

MVPM-mine

is quite good also in terms

of precision (0.635), if compared to other approaches.

The remaining measurements in the table show that

such achievements were obtained by way of simple

(and readable) workﬂow schemas:

MVPM-mine

ex-

hibits, indeed, the lowest average numbers of nodes

(6), edges (9) and edges per node (2.8).

Surprisingly good results (comparable to those

obtained by its best-performing competitors) were

achieved by

MVPM-mine

even when the descriptive

features used for the clustering do not convey any

kind of structural information. This result is quite re-

markable, as the discovered classiﬁcation (i.e. con-

ceptual clustering) model can be exploited to predict

the structure of any new process case from the very

moment when it is started, by only exploiting the

knowledge of its associated context data.

This latter impression is conﬁrmed also by the

results in Table 2, concerning the capability to ac-

curately discriminate among the discovered clusters.

Since none of the competitors directly learn any kind

of classiﬁcation model, for each of these methods, we

used the cluster identiﬁer assigned to each trace as it

were its associated class label. A logical classiﬁcation

model was the induced with the PCT-learning algo-

rithm implemented in (CLUS) (provided with a single

nominal target, instead of a vector of numeric ones,

like in function

minePC

of algorithm

MVPM-mine

Again, two different learning settings are considered:

setting a, where both structural and context data are

used as descriptive features; and setting b, where

only context features are used as descriptive attributes

(for classiﬁcation purposes). For evaluating the accu-

racy of the discovered classiﬁcation models, we em-

ployed two quality metrics commonly used in classi-

ﬁcation scenarios: (i) generalization error (a.k.a. ac-

curacy), i.e. the number of correct predictions the

model performed over the total number of predic-

tions; and Cramer’s coefﬁcient (Cramer, 1999), gaug-

ing the strength of statistical correlation between the

real and the predicted classiﬁcation label.

From Table 2, it is clear that

MVPM-mine

still out-

performs all competitors, so conﬁrming its ability to

correctly discriminate different execution scenarios

that can be, at a later stage, accurately re-discovered

when a classiﬁcation model is learned over them. It

is important to notice that such an ability is kept also

OntheDiscoveryofExplainableandAccurateBehavioralModelsforComplexLowly-structuredBusinessProcesses

215

in the critical setting b, where no structural informa-

tion is exploited for separating the discovered clus-

ters. Conversely, all competitors suffers from a neat

worsening of performances in this setting.

Table 3: Classiﬁcation rules describing two distinct trace

clusters discovered by

MVPM-mine

against the whole log.

Trace Cluster Clustering rule

(

product

∈ {PROD278,PROD473, PROD289,.. .} ∧

firstCountry

∈ {UnitedKingdom,Thailand,...} ∧

firstOrg

∈ { G157 2nd,T8 2nd, G186 2nd,...} ∧

workload

≤ 435) ∨

(

product

∈ {PROD374,PROD729, PROD80,.. .} ∧

firstOrg

∈ { G349 3rd, J2 2nd, G67 2nd,...} ∧

workload

> 477)

substatus

=assigned ∧

substatus

=wait ∧

substatus

=awaiting assignement ∧

firstOrg

∈ { T17 2nd,N14 2nd,N7 2nd, ...} ∧

workload

>423

Finally, in order to help the reader get a concrete

idea of the kind of knowledge that can be extracted

with our approach, Figures 3 and 4 show two work-

ﬂow models induced from two of the trace trace clus-

ters found in a run of

MVPM-mine

(launched according

to setting a). It is easy to notice that the two mod-

els are quite different, so allowing to reckon that the

handling of problems tended to follow quite different

execution scenarios in the analyzed process instances.

For the sake of completeness, we also report, in Ta-

ble 3, the clustering rules associated with these trace

clusters. Notably, these easily-interpretable sets of

rules let us identify which speciﬁc settings of context

variables tended to determine the happening of each

of the execution scenarios modeled in Figures 3 and 4.

7 CONCLUSIONS

We have presented a new clustering-oriented process

discovery approach, which addresses the critical case

of lowly-structuredprocess logs through the induction

of a high-quality multi-variant model. The approach

is meant to ensure good interpretability of the clus-

ters discovered, and good levels of ﬁtness in the rep-

resentation of each cluster’ behavior. The approach

has been implemented into a system prototype, which

supports the user in advanced analyses and monitor-

ing tasks.

Tests on a real-life log assess the ability of the ap-

proach to ﬁnd both (i) readable workﬂow models with

high levels of behavioral ﬁtness (w.r.t. the respec-

tive sublog); and (ii) accurate and easy-to-interpret

clustering rules, explaining the dependence of process

variants on context factors.

As future work, we plan to combine the approach

with alternative trace clustering methods (for ac-

complishing the ﬁrst of algorithm

MVPM-mine

), and

more sophisticated measures for comparing work-

ﬂow schemas and for estimating their conformance.

We will also intend to test our approach on other

real (lowly-structured) business process logs, possi-

bly featuring a richer range of non-structural data.

REFERENCES

CLUS: A predictive clustering system. http://dtai.cs.

kuleuven.be/clus/.

Alves de Medeiros, A. K., van der Aalst, W. M. P., and Wei-

jters, A. J. M. M. (2008). Quantifying process equiva-

lence based on observed behavior. Data & Knowledge

Engineering, 64(1):55–74.

Blockeel, H. and Raedt, L. D. (1998). Top-down induction

of ﬁrst-order logical decision trees. Artiﬁcial Intelli-

gence, 101(1-2):285–297.

Bose, R. P. J. C. and van der Aalst, W. M. P. (2009a). Con-

text aware trace clustering: Towards improving pro-

cess mining results. In Proc. of SIAM International

Conference on Data Mining, pages 401–412.

Bose, R. P. J. C. and van der Aalst, W. M. P. (2009b).

Trace clustering based on conserved patterns: To-

wards achieving better process models. In Business

Process Management Workshops, pages 170–181.

Buijs, J., van Dongen, B., and van der Aalst, W. (2012). On

the role of ﬁtness, precision, generalization and sim-

plicity in process discovery. In On the Move to Mean-

ingful Internet Systems: OTM 2012, volume 7565,

pages 305–322.

Cramer, H. (1999). Mathematical Methods of Statistics.

Princeton University Press.

de Medeiros, A. A. (2006). Genetic Process Mining. Phd

thesis, Eindhoven University of Technology.

De Weerdt, J., van den Broucke, S., Sand Vanthienen, J.,

and Baesens, B. (2013). Active trace clustering for

improved process discovery. IEEE Trans. on Knowl.

and Data Eng., 25(12):2708–2720.

Ferreira, D., Zacarias, M., Malheiros, M., and Ferreira,

P. (2007). Approaching process mining with se-

quence clustering: Experiments and ﬁndings. In Proc.

of 5th Int. Conf. on Business Process Management

(BPM’07), pages 360–374.

Folino, F., Greco, G., Guzzo, A., and Pontieri, L. (2008).

Discovering multi-perspective process models: The

case of loosely-structured processes. In ICEIS 2008,

Revised Selected Papers, pages 130–143.

Folino, F., Greco, G., Guzzo, A., and Pontieri, L.

(2011). Mining usage scenarios in business pro-

cesses: Outlier-aware discovery and run-time predic-

tion. Data & Knowledge Engineering, 70(12):1005–

1029.

Folino, F., Guarascio, M., and Pontieri, L. (2012). Discover-

ing context-aware models for predicting business pro-

cess performances. In Proc. of 20th Intl. Conf. on

Cooperative Information Systems (CoopIS’12), pages

287–304.

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

216

Greco, G., Guzzo, A., Pontieri, L., and Sacc`a, D. (2006).

Discovering expressive process models by clustering

log traces. IEEE Trans. Knowl. and Data Eng., 18(8).

Kunze, M., Weidlich, M., and Weske, M. (2011). Behav-

ioral similarity: A proper metric. In Proc. of 9th Int.

Conf. on Business Process Management (BPM’11),

pages 166–181.

Rozinat, A. and van der Aalst, W. M. P. (2008). Confor-

mance checking of processes based on monitoring real

behavior. Information Systems, 33(1):64–95.

Song, M., G¨unther, C. W., and van der Aalst, W. (2008).

Trace clustering in process mining. In Proc. of Busi-

ness Process Management Workshops (BPI’08), pages

109–120.

Steeman, W. (2013). BPI challenge 2013, closed problems.

van der Aalst, W. (1998). The application of Petri nets to

worﬂow management. Journal of Circuits, Systems,

and Computers, 8(1):21–66.

van Der Aalst, W., Adriansyah, A., and van Dongen, B.

(2011). Causal nets: a modeling language tailored to-

wards process discovery. In Proc. of the 22nd Intl.

Conf. on Concurrency Theory, CONCUR’11, pages

28–42, Berlin, Heidelberg. Springer-Verlag.

van der Aalst, W. M. P., van Dongen, B. F., Herbst,

J., Maruster, L., Schimm, G., and Weijters, A. J.

M. M. (2003). Workﬂow mining: a survey of is-

sues and approaches. Data & Knowledge Engineer-

ing, 47(2):237–267.

Weidlich, M., Polyvyanyy, A., Desai, N., Mendling, J.,

and Weske, M. (2011). Process compliance analysis

based on behavioural proﬁles. Information Systems,

36(7):1009–1025.

Weijters, A. J. M. M. and Ribeiro, J. T. S. (2011). Flexi-

ble heuristics miner (FHM). In Proc. of IEEE Sympo-

sium on Computational Intelligence and Data Mining

(CIDM 2011), pages 310–317.

Weijters, A. J. M. M. and van der Aalst, W. M. P. (2003).

Rediscovering workﬂow models from event-based

data using Little Thumb. Integrated Computer-Aided

Engineering, 10(2):151–162.

OntheDiscoveryofExplainableandAccurateBehavioralModelsforComplexLowly-structuredBusinessProcesses

217