A Data-adaptive Trace Abstraction Approach to the

Prediction of Business Process Performances

Antonio Bevacqua

, Marco Carnuccio

, Francesco Folino

, Massimo Guarascio

and Luigi Pontieri

DIMES Department, University of Calabria, via P. Bucci 41C, 87036, Rende, CS, Italy

ICAR-CNR, National Research Council of Italy, via P. Bucci 41C, 87036, Rende, CS, Italy

Keywords:

Data Mining, Regression, Clustering, Business Process Analysis.

Abstract:

This paper presents a novel approach to the discovery of predictive process models, which are meant to support

the run-time prediction of some performance indicator (e.g., the remaining processing time) on new ongoing

process instances. To this purpose, we combine a series of data mining techniques (ranging from pattern min-

ing, to non-parametric regression and to predictive clustering) with ad-hoc data transformation and abstraction

mechanisms. As a result, a modular representation of the process is obtained, where different performance-

relevant variants of it are provided with separate regression models, and discriminated on the basis of context

information. Notably, the approach is capable to look at the given log traces at a proper level of abstraction,

in a pretty automatic and transparent fashion, which reduces the need for heavy intervention by the analyst

(which is, indeed, a major drawback of previous solutions in the literature). The approach has been validated

on a real application scenario, with satisfactory results, in terms of both prediction accuracy and robustness.

1 INTRODUCTION

The general aim of process mining tech-

niques (van der Aalst et al., 2003) is to extract

information from historical log “traces” of a business

process (i.e., sequences of events registered during

different enactments of the process), in order to

help analyze and possibly improve it. An emerging

trend in this ﬁeld (see, e.g., (van Dongen et al.,

2008; van der Aalst et al., 2011; Folino et al., 2012))

concerns the prediction of performance indicators

(deﬁned on each process instance), as a way to

help improve future process enactments, through,

e.g., recommendation or risk analysis. In general,

these approaches face the problem by inducing

some kind of prediction model for the given per-

formance indicator, based on some suitable trace

abstraction function (mapping, e.g., the trace onto

the set/multiset of process tasks appearing in it).

In particular, a non-parametric regression model

is used in (van Dongen et al., 2008) to build the

prediction for a new (possibly partial) trace, based on

its similarity towards a set of historical ones – where

the similarity between two traces is evaluated by

comparing their respective abstract views. However,

such an instance-based scheme is likely to take long

prediction times (unsuitable for many real run-time

environments), especially in the case of complex and

ﬂexible processes, where a large amount of historical

traces should be kept (and retrieved) to capture ade-

quately its wide range of behaviors. A model-based

prediction scheme is conversely followed in (van der

Aalst et al., 2011), where an annotated ﬁnite-state

machine (FSM) model is induced from the input

log, with the states corresponding to abstract repre-

sentation of log traces. The discovery of such FSM

models was combined in (Folino et al., 2012) with

a context-driven (predictive) clustering approach,

in a way that different execution scenarios can be

discovered for the process, and equipped with distinct

(more speciﬁc and more precise) local predictors.

In general, a critical issue in the induction of such

prediction models, especially in the case of complex

business processes, concerns the deﬁnition of a suit-

able trace abstraction function, capable to focus on

the core properties of the events (happened in a pro-

cess instance) that impact the more on its performance

outcomes. In fact, as discussed in (van der Aalst

et al., 2011), choosing the right abstraction level is

a delicate task, where an optimal balance has to be

reached between the risks of overﬁtting (i.e., having

an overly detailed model, nearly replicating the train-

ing set, which will hardy provide accurate forecasts

over unseen cases) and of undeﬁtting (i.e., the model

Bevacqua A., Carnuccio M., Folino F., Guarascio M. and Pontieri L..

A Data-adaptive Trace Abstraction Approach to the Prediction of Business Process Performances.

DOI: 10.5220/0004448700560065

In Proceedings of the 15th International Conference on Enterprise Information Systems (ICEIS-2013), pages 56-65

ISBN: 978-989-8565-59-4

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

is too abstract and imprecise, both on the training

cases and on new ones).

Previous approaches mainly leave the responsibil-

ity to tune the abstraction level to the analyst, allow-

ing her/him to select the general form of trace abstrac-

tions (e.g., set/multiset/lists of tasks), and to possibly

ﬁx a maximal horizon threshold h — i.e., only the

properties of the h more recent events in a trace can

be used in its abstract view, so discarding older events.

On the other hand, FSM-like models cannot effec-

tively exploit non-structural (context-oriented) prop-

erties of the process instances, which might actually

impact on performances as well. In fact, the idea of

including such data (in addition to tasks) in the con-

struction of trace abstractions will clearly emphasize

the combinatorial explosion issue discussed above.

Core Idea and Contributions. We try to overcome

the above limitations, by devising a novel approach,

capable of both taking full advantage of “non struc-

tural” context data, and of ﬁnding a good level of ab-

straction over on the history of process instances, in a

pretty automated and transparent fashion.

Our core belief is that handy (and yet accurate

enough) prediction models can be learnt via vari-

ous existing model-based regression algorithms (ei-

ther parametric, such as, e.g., (Hardle and Mammen,

1993; Quinlan, 1992), or non-parametric, such as,

e.g., (Harlde, 1990; Witten and Frank, 2005)), rather

than resorting to an explicit representation of process

states (like in (van der Aalst et al., 2011; Folino et al.,

2012)) or to an instance-based approach, like in (van

Dongen et al., 2008). This clearly requires that an ad-

equate propositional representation of the given traces

is preliminary build, capturing both structural (i.e.,

task-related) and (“non-structural”) aspects. To this

end, we propose to convert each process trace into a

set or a multi-set of process tasks, and let the regres-

sion method decide automatically which and how the

basic structural elements in such an abstracted view

of the trace are to be used to make a forecast.

Moreover, we still leverage the idea of (Folino

et al., 2012), of combining performance prediction

with a predictive clustering technique (Blockeel and

Raedt, 1998), in order to distinguish heterogeneous

context-dependent execution scenarios (“variants”)

for the analyzed process, and eventually provide each

of them with a specialized regressor. In fact, such an

approach brings, in general, substantial gain in terms

of readability and accuracy, besides explicitly show-

ing the dependence of discovered clusters on context

features, and speeding up (and possibly parallelize)

the computation of regression models — which are

typically more compact, more precise and easier to

read/evaluate/validate than a single regression model

extracted out of the whole log. In fact, we believe

that (as conﬁrmed by the empirical results in Sec-

tion 5) even very simple regression methods can fur-

nish robust and accurate predictions, if combined with

a properly devised clustering procedure. Moreover,

this can even allow for a scalable usage of instance-

based regression schemes (which are likely to take

prohibitive prediction times on real logs), seeing as

only the traces of a single cluster (selected on the ba-

sis of context features) would be scanned over.

We pinpoint that the target features used in the clus-

tering (where context features conversely act as de-

scriptive attributes) are derived from frequent struc-

tural patterns (still deﬁned as sets or bags of tasks),

instead of directly using the abstract representations

extracted by the log, as done in (Folino et al., 2012).

These patterns will be discovered efﬁciently via an

ad-hoc a-priori-like (Agrawal and Srikant, 1994)

method, where the analyst is allowed to specify a min-

imum support threshold, and possibly an additional

(“gap”) constraint, both enforced in the very genera-

tion of the patterns. Notably, such an approach frees

the analyst from the burden of explicitly setting the

abstraction level(i.e., the size of patterns, in our case),

which is determined instead in a data-driven way.

Organization. The rest of the paper is structured as

follows. After introducing some handy notation and

core concepts in Section 2, we present the proposed

approach, in an algorithmic form, in Section 3. We

then discuss the implementation of the method in Sec-

tion 4, and an experimentation on a real data in Sec-

tion 5. Section 6 ﬁnally presents some concluding

remarks and future work directions.

2 FORMAL FRAMEWORK

2.1 Logs and Performances

As usually done in the literature, we assume that

for each process instance (a.k.a “case”) a trace is

recorded, storing the sequence of events happened

during its unfolding. Let T be the universe of all (pos-

sibly partial) traces that may appear in any log of the

process under analysis. For any trace τ ∈ T , len(τ) is

the number of events in τ, while τ[i] is the i-th event

of τ, for i = 1 .. len(τ), with task(τ[i]) and time(τ[i])

denoting the task and timestamp of τ[i], respectively.

We also assume that the ﬁrst event of each trace is al-

ways associated with a unique “initial” task A

(pos-

sibly added artiﬁcially before analyzing the log), and

AData-adaptiveTraceAbstractionApproachtothePredictionofBusinessProcessPerformances

its timestamp registers the time when the correspond-

ing process instance started.

Let us also assume that, for any trace τ, a tuple

context(τ) of data is stored in the log to keep informa-

tion about the execution context of τ — like in (Folino

et al., 2012), this tuple may gather both data prop-

erties and environmental features (characterizing the

state of the BPM system). For ease of notation, let A

denote the set of all the tasks (a.k.a., activities) that

may occur in some trace of T , and context(T ) be the

space of context vectors — i.e., A

= ∪

τ∈T

tasks(τ),

and context(T ) = {context(τ) | τ ∈ T }.

Further, τ(i] is the preﬁx (sub-)trace containing the

ﬁrst i events of a trace τ and the same context data

(i.e., context(τ(i] = context(τ)), for i = 0 .. len(τ).

A log L isa ﬁnite subset of T , while the preﬁx set of

L, denoted by P(L), is the set of all the preﬁxes of L’s

traces, i.e., P (L) = {τ(i] | τ ∈ L and 1 ≤ i ≤ len(τ)}.

Let ˆµ : T → R be an (unknown) function assigning

a performance value to any (possibly unﬁnished) pro-

cess trace. For the sake of concreteness, we will focus

hereinafter on the special case where the target per-

formance value associated with each trace is the re-

maining process time (measured, e.g., in days, hours,

or in ﬁner grain units), i.e., the time needed to ﬁn-

ish the corresponding process enactment. Moreover,

we assume that performance values are known for all

preﬁx traces in P (L), for any given log L. In fact, for

each trace τ, the (actual) remaining-time value of τ(i]

is ˆµ(τ(i]) = time(τ[len(τ)]) − time(τ[i]).

A (predictive) Process Performance Model (PPM)

is a model that can estimate the unknownperformance

value (i.e., the remaining time in our setting) of a pro-

cess enactment, based on the contents of the corre-

sponding trace. Such a model can be viewed as a

function µ : T → R estimating ˆµ all over the trace uni-

verse — which also includes the preﬁx traces of all

possible unﬁnished enactments of the process. Learn-

ing a PPM hence amounts to solving a particular in-

duction problem, where the training set takes the form

of a log L, and the value ˆµ(τ) of the target measure is

known for each (sub-)trace τ ∈ P (L).

Recent approaches to this problem (van der Aalst

et al., 2011; van Dongen et al., 2008; Folino et al.,

2012) leverage all the basic idea of applying some

suitable abstraction function to process traces, with

the ultimate aim of capturing only those facets of the

registered events that inﬂuence the most process per-

formances — while disregarding minor details.

2.2 Trace Abstraction

An abstracted (structural) view of a trace gives a con-

cise description of the tasks executed during the cor-

responding process enactment. Two common ways of

building such a view consist in simply regarding the

trace as a multiset (a.k.a. bag) or as a set of tasks, are

formally deﬁned below.

Deﬁnition 1 (Structural Trace Abstraction). Let T

be a trace universe and A

,... ,A

be the tasks

in A

. A structural (trace-) abstraction func-

tion struct

mode

: T → R

mode

is a function map-

ping each trace τ ∈ T to an an abstract representa-

tion struct

mode

(τ), taken from a abstractions’ space

mode

. Two concrete instantiations of the above func-

tion, denoted by struct

bag

: T → N

(resp., struct

set

T → {0, 1}

), are deﬁned next to which map each

trace τ ∈ T onto a bag-based (resp., set-based)

representation of its structure: (i) struct

bag

(τ) =

hcount(A

,τ),. .. , count(A

,τ)i, where count(A

,τ) is

the number of times that task A

occurs in τ; and

(ii) struct

set

(τ) = hocc(A

,τ),. .. , occ(A

,τ)i, where

occ(A

,τ) =

true

iff count(A

,τ) > 0. 

The two concrete abstraction “modes” (namely,

bag and set) deﬁned above summarize any trace τ into

a vector, where each component corresponds to a sin-

gle process task A

, and stores either the number of

times that A

appears in the trace τ, or (respectively)

a boolean value indicating whether A

occur in τ or

not. Notice that, in principle, we could deﬁne abstract

trace representations as sets/bags over another prop-

erty of the events (e.g., the executor, instead of the

task executed), or even over a combination of event

properties (e.g., the task plus who performed it).

Example 1. Let us consider a real-life case study

pertaining a transshipment process, used for the ex-

periments described in Section 5. Basically, for each

container c passing through the harbor, a distinct log

trace τ

is stored, registering all the tasks applied to

c, which may include: moving c by means of either

a straddle-carrier (MOV), swapping c with another

container (SHF), and loading c onto a ship by way

of a shore crane (OUT). Let τ be a log trace stor-

ing a sequence he

i of three events such that

task(e

) = task(e

) = MOV and task(e

) = OUT.

With regard to the abstract trace representations intro-

duced in Def. 1, it is easy to see that struct

bag

(τ) =

[2,0,1], and struct

set

(τ) = [1,0, 1] — where the traces

are mapped into a vector space consisting of the di-

mensions A

≡ MOV, A

≡ SHF, A

≡ OUT. ⊳

The structural abstraction functions in Def. 1 are

a subset of the ones used in previous approaches to

the discovery of predictive process models (van der

Aalst et al., 2011; van Dongen et al., 2008; Folino

et al., 2012). To be more precise, (van Dongen et al.,

2008) also considers the possibility to map a trace into

a vector of task durations, as well as to combine mul-

ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems

tiple structural abstractions with data attributes of the

traces, while the other two approaches also allow for

abstracting a trace into the list of tasks appearing in it

(as an alternative to bag/set -oriented abstractions).

2.3 Clustering-based PPM

Like in (Folino et al., 2012), we here consider a spe-

cial kind of PMM model, relying on a clustering of

the process traces. Such a model, described below, is

indeed a predictive clustering model, where context

data play the role of descriptiveattributes, whereas the

target variables are derived by speciﬁc performance

values, extracted out of the traces.

Deﬁnition 2 (Clustering-based Performance Predic-

tion Model (

CB-PPM

)). Let L be a log (over T ), with

context features context(T ), and ˆµ : T → R, be a per-

formance measure, known for all τ ∈ P (L). Then

a clustering-based performance prediction model

(

CB-PPM

) for L is a pair M = hc,hµ

,... ,µ

ii, which

encodes the unknown performance function ˆµ in

terms of a predictive clustering model, with k de-

noting the associated number of clusters (found for

L). More speciﬁcally, c is a partitioning function,

which assigns any (possibly novel) trace to one of the

clusters (based on its context data), while each µ

the PPM of the i-th cluster — i.e., c : context(T ) →

{1,... ,k}, and µ

: T → R, for i ∈ {1,... ,k}. The

performance ˆµ(τ) of any (partial) trace τ is eventually

estimated as µ

(τ), where j=c(context(τ)). 

In this way, each cluster has its own PPM model,

encoding how ˆµ depends on the structure (and, possi-

bly, the context) of a trace, relatively that cluster. The

prediction for each trace is hence made with the pre-

dictor of the cluster it is assigned to (by function c).

In general, such an articulated kind of PPM can be

built by inducing both a predictive clustering model

and multiple PPMs (as the building blocks imple-

menting c and all µ

, respectively). In particular,

in (Folino et al., 2012), the latter task is accomplished

by using the method in (van der Aalst et al., 2011),

so that each cluster is eventually provided with an

A-FSM model. As mentioned in Section 1, in or-

der to develop an easier-to-use and data-adaptive ap-

proach, we will not use A-FSM models (which typi-

cally require a careful explicit setting of the abstrac-

tion level), and will rather employ one of the various

regression methods available for propositional data.

To this purpose, an ad-hoc view of the log will be pro-

duced, where both the context-oriented and structure-

oriented (cf. Def. 1) features of a trace are used as

descriptive attributes, whereas the target attributes are

derived by projecting the trace onto a space of struc-

tural patterns. These patterns, which will be com-

puted via an ad-hoc data mining method, are de-

scribed in detail in the following.

2.4 Structural Patterns

In our setting, structural patterns are meant to cap-

ture regularities in the structure of traces, abstracted

via sets or bags of tasks. In particular, these patterns

can be regarded as (constrained) sub-sets or sub-bag

of tasks that appear frequently in the (abstracted) log

traces. Let mode∈ {bag,set} denotes a given abstrac-

tion criterion, T be the reference trace universe, and

= {A

,..., A

} be its associated process tasks.

Then, a (structural) pattern w.r.t. T and mode sim-

ply is an element p of the abstractions’ space R

mode

– over which the structural trace-abstraction function

struct

mode

ranges indeed. The size of p, denoted by

size(p), is the number of distinct tasks in p (i.e., the

number of p’s components with a positive value.

Having a structural pattern the same form as a

(structural) trace abstraction, we can apply usual

set/bag containment operators to them both. Speciﬁ-

cally, given two elements p and p

′

of R

mode

(be each

a pattern or a full trace representation), we say that p

contains p

(and symmetrically p

is contained in p

denoted by p

 p

, if p

[ j] ≤ p

[ j], for j = 1,≤,n,

and for i = 1, 2 — where p

[ j] is the i-th component

of p

, viewed as a vector in D

(cf. Def. 1).

As we want to eventuallyuse such patterns for clus-

tering purposes, we are interested in those that capture

signiﬁcant behavioral schemes. In particular, an im-

portant property required for such patterns is that they

occur frequently in the given log (otherwise, little,

low signiﬁcant clusters are likely to be discovered),

as speciﬁed in the following notion of support.

Deﬁnition 3 (Pattern Support and Footprints). Let τ ∈

T be a trace, mode be a given abstraction mode, and

p be pattern (w.r.t. T and mode).

Then, we say that τ supports p, denoted by τ ⊢

p if its corresponding structural abstraction con-

tains p (i.e., p  struct

mode

(τ)). In such a case,

a footprint of p on τ is subset F = { f

,..., f

} ⊆

{1,... ,len(τ)} of positions within τ, such that

struct

mode

(hτ[ f

],... ,τ[ f

]i) = p. Moreover, gap(F)

is the number of events in τ which match no

position of F and appear in between a pair of

matching events — i.e., gap(F) = max

∈F

{ f

} −

min

∈F

{ f

} − |F| + 1. Finally, with little abuse

of notation, we denote gap(p,τ) = min {{∞} ∪

{ gap(F) | F is a footprint of p on τ } }. 

In words, a footprint F of a pattern p, on a trace τ

supporting it, identiﬁes a subsequence of τ which (i)

contains the events occurring in τ at one of the po-

sitions in F, and (ii) has a structural representation

AData-adaptiveTraceAbstractionApproachtothePredictionofBusinessProcessPerformances

Input: A log L over a trace universe T , with associated tasks AS = A

,... , A

and target performance measure ˆµ

(known over P (L) ), an abstraction mode m ∈ {set, bag} (cf. Def. 1), three thresholds, minSupp ∈ [0,1),

maxGap ∈ N ∪ {∞}, and K

top

∈ N

∪ {∞}, and a base regression method REGR

Output: A

CB-PPM

model for L (fully encoding ˆµ all over T ).

Method: Perform the following steps:

1 Let context(τ) be the vector of context data associated with each τ ∈ L;

2 Build a structural view S

of P (L), by replacing each τ ∈ P (L) with a transaction-like representation of struct

(τ);

3 RSP :=

minePatterns

,m,minSupp,maxGap);

4 RSP : =

filterPatterns

(RSP, kTop);

6 Let RSP = {p

,... , p

};

7 Build a log sketch P

for L, by using both context data and RSP-projected performances;

8 Learn a PCT T, using context(τ) (resp., val(τ, p

), i=1..s) as descriptive (resp., target) features for each τ ∈ L;

9 Let L[1],. . .,L[k] denote the discovered clusters;

10 for each L[i] do

11 Induce a regression model ppm

out of P (L[i]), using method REGR — regarding, for each τ ∈ P (L[i]),

context(τ) and struct

(τ) as the input values, and the performance measurement ˆµ(τ) as the target value;

10 Store ppm

as the implementation of the prediction function µ

: T → M (for cluster i);

11 end

12 return h c,{ µ

,... , µ

} i.

Figure 1: Algorithm

AA-PPM Discovery

coinciding with p. Clearly, if p is not supported by τ,

gap(p,τ) will take an inﬁnite value.

3 SOLUTION ALGORITHM

Figure 1 illustrates the main steps of our approach to

the discovery of a CB-PPM model, in the form of an

algorithm, named

AA-PPM Discovery

. Essentially,

the problem is approached in three main phases.

In the ﬁrst phase (Steps 1-5), a set of (frequent)

structural patterns are extracted from the log, which

are deemed to capture the main behavioral schemes

of the process, as concerns the dependence of per-

formance on the execution of tasks. To this end, af-

ter converting the structure of each (possibly partial)

trace τ into an itemset (Step 2)

, we compute all the

structural patterns (i.e., sub-sets, of various sizes) that

occur frequently in the log and effectively summarize

the behaviors in the log. More precisely, we ﬁrst com-

pute the set {p ∈ R

| supp

maxGap

(p,S

) ≥ minSupp}

(cf. Def.3), by using function

minePatterns

, which

is stored in RSP — note that this set will never be

empty, since (as an extreme case) at least a singleton

pattern with A

is frequent (no matter of minSupp,

m and maxGap). These patterns are then ﬁltered by

function

filterPatterns

, which selects the kTop

most relevant patterns among them. Notably, we

can still use all the discovered patterns, by ﬁxing

maxGap = ∞ (no real ﬁlter is applied in this case).

In particular, as to bags, any s = struct

bag

(τ) ∈ N

turned into {(A

) | 0 ≤ i ≤ n, s[i] > 0 and 1 ≤ j ≤ s[i] }.

Both these functions are explained in details later on.

In the second phase, the selected patterns are used

to associate a series of numerical variables with all

traces (Step 7), and to carry out a predictive clustering

of them (Step 8). To this end, a propositional view of

the log, here named log sketch, is produced by trans-

forming each trace into a tuple, where context prop-

erties play as are descriptive attributes and the projec-

tion onto the space of selected patterns are the target

numerical features. Speciﬁcally, any selected pattern

p gives rise to a target (performance) feature, such

that the value val(τ, p) taken by it on any trace τ is be

computed as follows: (i) val(τ, p) =

NULL

, if τ 0 p,

or (ii) val(τ, p) = ˆµ(τ( j

∗

]), where j

∗

is the biggest in-

dex j ∈ {1,. ..,len(τ)} such that τ( j] ⊢ p. Like in

(Folino et al., 2012), the clustering is computed by in-

ducing a Predictive Clustering Tree (PCT) (Blockeel

and Raedt, 1998) from the log sketch (Step 8).

Finally, each cluster is equipped with a basic (not

clustering-based) PPM model, by using some suit-

able regression method (chosen through parameter

REGR), provided with a dataset encoding all the pre-

ﬁxes that can be derived from the traces assigned to

the cluster. Speciﬁcally, each such preﬁx τ is en-

coded as a tuple where context(τ) and struct

(τ) are

regarded as input values, while the associated perfor-

mance measurement ˆµ(τ) represents the value of the

numerical target variable that is to be estimated.

3.1 Function minePatterns

This function computes all the patterns, of any size,

that get a support equal to or higher than minSupp

ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems

in a transactional view of trace structures. Notably,

the function does not require the analyst to specify

the size of each pattern (differently from the horizon

threshold h used in previous methods), which is in-

stead determined automatically, in a data-driven way.

However, it allows for possibly ﬁxing a ﬁnite maxGap

threshold for the gaps admitted between patterns and

traces, in case she/he wants to keep some more details

on the actual sequencing of the tasks. Technically, this

constraint is used to introduce a reﬁned support func-

tion, deﬁned as follows:

supp

maxGap

(p,L) =|{τ ∈ L | τ ⊢ p and

gap(p,τ) < maxGap+1}| (1)

Note that this function actually coincides with

the standard one when maxGap = ∞ (under the

usual convention that ∞ + 1=∞). As a result,

minePatterns

will return each pattern p ∈ R

such

that supp

maxGap

(p,L) ≥ minSupp. It can be proved

that this computation can be done in a level-wise way,

despite the fact that the support function does not en-

joy any (anti-)monotonicity property.

The basic computation scheme for the function

minePatterns

, sketched in Figure 2, assumes that:

m is the chosen (structural) trace abstraction mode,

minSupp and maxGap are the thresholds for specify-

ing support and gap requirements, respectively, and L

is the original log, taken from a reference trace uni-

verse T . In practice, the function actually works with

a transactional encoding of L, storing a set-oriented

representation of struct

(τ) for each τ ∈ P (L)).

Function

minePatterns

: transaction set; minSupp:

real; maxGap: integer; m : {set,bag}): a set of fre-

quent structural patterns (i.e., a subset of R

)

:= {x ∈ AS | minSupp× |S

| ≤ |{t ∈ S

| x is in t}| };

:= I

;

F := I

;

while I

0 do

C ← {p

′

= p∪ {x} ∧ p ∈ I

∧ x ∈ I

∧ x /∈ p};

for each τ ∈ P (L) do

← {p | p ∈ C ∧ τ ⊢ p∧ gap(p,τ) < maxGap+ 1}

for each p ∈ C

count[p] ← count[p] + 1;

end

:= { p | p ∈ C ∧ count[p] ≥ minSupp× |S

| };

F := F

;

end

return F

Figure 2: Function

minePatterns

Clearly, all the above constraints (including the

gaps one) are enforced at each level of the pattern gen-

eration procedure, with the advantage of shrinking the

amount of patterns generated at each step (k), as well

as the overall computation time.

This can be done with no risk of endangering the

correctness and completeness of results, as informally

proven in the folllow.

Let us ﬁrst consider the case maxGap = ∞. As, in

this case, function supp

maxGap

(cf. Eq. 3.1) is anti-

monotonic w.r.t. pattern size, when computing the

patterns of size k we can safely ﬁlter out any pattern p

of them such that supp(p) < minSupp – indeed, there

cannot be any pattern p

′

such that size(p

′

) = k + 1,

p  p

′

and supp(p

′

) ≥ minSupp.

By converse, such a monotonicity property is not

enjoyed by gap constraints. Indeed, when remov-

ing an element from a pattern, the resulting pat-

tern may contain a higher number of spurious ac-

tivities (i.e., may get a higher gap score) on some

traces. For example, consider the pattern p =

{a,b,c,d} (of size 4) and the trace τ encoding the se-

quence y

,... ,y

,a, b,x

,c,x

,d,z

,... ,z

of task

labels. For the sake of simplicity, and w.l.o.g.,

let us assume that the tasks a, b,c,d do not occur

in any other parts of the traces (i.e., {a,b,c,d} ∩

,... ,y

,... ,z

)} =

0), so that we can focus on

the subsequence of tasks a, b,x

,c,x

,d — which

actually determines the gap score for p and τ.

Clearly, for every pattern p

′

obtained by removing

an “internal” element (i.e., different from both a and

d in the example) from a given pattern p, it holds

gap(p

′

,τ) ≥ gap(p, τ) – e.g., gap({a,c,d},τ) =

gap({a,b,d},τ) = 4 > 3 = gap(p,τ).

However, the two patterns obtained by removing

one of the two “extreme elements” (i.e., either a or

d) are guaranteed to have the same gap as p, or even

to lower it. For example, gap({b, c,d},τ) = 3 =

gap(p,τ), and gap({a, b,c},τ) = 2 < gap(p, τ).

Consequently, in the computation scheme de-

picted above, for any relevant pattern p of level

k > 1 (i.e., for any k-sized pattern p such that

supp

maxGap

(p,L) ≥ minSupp), there are at least two

relevant sub-patterns of level k− 1 which will produce

p when merged together.

3.2 Function filterPatterns

The function is meant to select a subset of mostly sig-

niﬁcant and useful patterns, in order to allow for a

more effective and more scalable computation of pre-

dictive clustering models. In particular, we want to

prevent the case where the PCT learning algorithm

has to work in a sparse and high-dimensional tar-

get space, where low-quality will hardly be found,

while long computation times are likely to take place.

AData-adaptiveTraceAbstractionApproachtothePredictionofBusinessProcessPerformances

Hence, we allow the analyst to ask for only keeping

the kTop patterns that seems to discriminate the main

performance proﬁles at best. To this end, we employ

a variant of the scoring function φ proposed in (Folino

et al., 2012) (giving score 0 to every feature with no

positive correlation with context data), which is es-

sentially meant to give preference to patterns ensuring

a higher values of the followingmeasures: (i) support,

(ii) correlation with the context attributes and (iii) and

variability of the associated performance values (i.e.,

val(p,τ), with τ ranging over L). Further details can

be found in (Folino et al., 2012).

4 IMPLEMENTATION

The prototype system

AA-TP

(Adaptive-Abstraction

Time Prediction), specializes algorithm AA-PPM Dis-

covery to the case where the remaining processing

time is the target performance. The logical architec-

ture of the system is sketched in Figure 3, where the

arrows between blocks stand for information ﬂows,

while Log Data is a collection of process logs repre-

sented in the MXML (or XES) format (van der Aalst

and et al., 2007).

The Scenario Discovery module is responsible

for identifying behaviorally homogeneous groups of

traces, in terms of both context data and remaining

times. In particular, the discovery of different trace

clusters is carried out by the Predictive Clustering

submodule which groups traces sharing both simi-

lar descriptive and target values. This latter mod-

ule leverages the CLUS system (DLAI Group, 1998),

a predictive clustering framework for inducing PCT

models out of propositional data.

In this regard, the Log-View Generator submod-

ule acts as a “translator” which converts all log traces

into propositional tuples, according to the ARFF for-

mat used in CLUS. As explained above, this mapping

relies on the explicit representation of both context

data and target attributes, derived from the original

log. In this process, each trace can be enriched with

additional (environmental) context features, such as

workload indicators and aggregated time dimensions,

in case they are not explicitly stored in the log.

These context-enriched traces are then mapped into

a transactional form and delivered to the Pattern Min-

ing module. This latter, in its turn, will provide the

Log-View Generator with a set of frequent patterns,

which are to be eventually used as target features for

the predictive clustering step. Speciﬁcally, the extrac-

tion of frequent patterns is carried out by the Pattern

Mining module in two steps: ﬁrst patterns are ﬁrst

mined (by the Pattern Extractor submodule), and then

Log Data

Log-View

Generator

Predictive

Clustering

Prediction Error

Evaluator

REPTree

Regressor

Adaptive Abstraction – Time Predictor (AA-TP) Plugin

Scenario Discovery

Time Predictors Learning

Evaluation

Report

CB-PPM

Model

Linear

Regressor

IB-k

Regressor

Pattern Mining

Pattern

Extractor

Pattern

Filter

Figure 3: Logical architecture of the

AA-TP

system.

the most relevant of them are selected (by the Pattern

Filter submodule).

Once the predictive clustering procedure has been

completed, the traces (each labeled with a cluster ID)

are delivered to the Time Predictors Learning mod-

ule, which implements a range of classical regression

algorithms (including, in particular, IB-k, Linear Re-

gression, and RepTree). These algorithms are eventu-

ally used to induce the local predictor (i.e., PPM) of

each discoveredcluster, which will compose (together

with the logical rules discriminating among the clus-

ters) the overall

CB-PPM

model, returned as the main

result. For inspection purposes and further analysis,

such a model is stored in a repository. Finally, the

Evaluation module helps the user assess the quality

of time predictions on a generic test set.

5 EXPERIMENTS

This section illustrates some experimental activities

that we conducted, on real data, with the prototype

system

AA-TP

, implementing a specialized version of

AA-PPM Discovery algorithm, where the target per-

formance measure is the remaining processing time

(i.e., the time needed to complete a partial process in-

stance) — as explained in the previous section.

5.1 Testbed

The experiments were performed on the logs of a real

transshipment system, mentioned in Example 1 —

more precisely, on a sample of 5336 traces, corre-

sponding to all the containers that passed through the

system in the ﬁrst third of year 2006.

The log stores a series of logistic activities applied

to each container passing through a maritime termi-

nal. Basically, each container is unloaded from a

ship and temporarily placed near to the dock, un-

til it is carried to some suitable yard slot for be-

ing stocked. Symmetrically, at boarding time, the

container is ﬁrst placed in a yard area close to the

ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems

!"#$%

!"&!%

!"&$%

!"'!%

!"'$%

!"(!%

!"($%

!"$!%

!"$$%

!% (% )% *% !% (% )% *% !% (% )% *% !% (% )% *% !% (% )% *% !% (% )% *% !% (% )% *% !% (% )% *%

(% *% (% *% (% *% (% *%

!+#% !+&% !+'% !+(%

,-./-0%

1230%

,456700%

89:1%

;45<-=%

><02=<<%

(a) The effect of parameters on rmse results.

!"!!#

!"!$#

!"%!#

!"%$#

!"&!#

!"&$#

!"'!#

!"'$#

!# (# )# *# !# (# )# *# !# (# )# *# !# (# )# *# !# (# )# *# !# (# )# *# !# (# )# *# !# (# )# *#

(# *# (# *# (# *# (# *#

!+%# !+&# !+'# !+(#

,-./-0#

1230#

,456700#

89:1#

;45<-=#

><02=<<#

(b) The effect of parameters on mae results.

!"!!#

!"$!#

!"%!#

!"&!#

!"'!#

("!!#

("$!#

!# %# '# )# !# %# '# )# !# %# '# )# !# %# '# )# !# %# '# )# !# %# '# )# !# %# '# )# !# %# '# )#

%# )# %# )# %# )# %# )#

!*(# !*$# !*+# !*%#

,-./-0#

1230#

,456700#

89:1#

;45<-=#

><02=<<#

Figure 4: Sensitiveness of AA-TP w.r.t. parameters at varying of kind of regressor.

dock, and then loaded on a cargo. Different kinds

of vehicles can be used for moving a container, in-

cluding, e.g., cranes, “straddle-carriers”, and “multi-

trailers”. This basic life cycle may be extended with

additional transfers, devoted to make the container

approach its ﬁnal embark point or to leave room for

other ones. Several data attributes are available for

each container as context data (of the correspond-

ing process instance), including: the origin and ﬁnal

ports, its previous and next calls, various properties

of the ship unloading it, physical features (such as,

e.g., size, weight), and some information about its

contents. Like in (Folino et al., 2012), we also consid-

ered a few more (environment-oriented) context fea-

tures for each container: the hour (resp., day of the

week, month) when it arrived, and the total number of

containers that were in the port at that moment.

Considering the remaining processing time as the

AData-adaptiveTraceAbstractionApproachtothePredictionofBusinessProcessPerformances

Table 1: Errors (avg±stdDev) made by

AA-TP

and its com-

petitors, when using the bag abstraction mode.

AA-TP

(IB-k)

AA-TP

(RepTree)

CA-TP FSM

rmse

0.205±0.125 0.203±0.082 0.291±0.121 0.505±0.059

mae

0.064±0.058 0.073±0.033 0.142±0.071 0.259±0.008

mape

0.119±0.142 0.189±0.136 0.704±0.302 0.961±0.040

Table 2: Errors (avg±stdDev) made by

AA-TP

and its com-

petitors, when using the set abstraction mode.

AA-TP

(IB-k)

AA-TP

(RepTree)

CA-TP FSM

rmse

0.287±0.123 0.286±0.084 0.750±0.120 0.752±0.037

mae

0.105±0.061 0.112±0.035 0.447±0.077 0.475±0.009

mape

0.227±0.131 0.267±0.060 2.816±0.303 2.892±0.206

target performance measure, we will measure predic-

tion effectiveness by way of three classic error metrics

(computed via 10-fold cross validation): root mean

squared error (rmse), mean absolute error (mae), and

mean absolute percentage error (mape). For an easier

interpretation of results, the former two metrics will

be normalized w.r.t. the average dwell-time (ADT),

i.e., the average length of stay over all the containers

that passed through the terminal. In this way, all the

quality metrics will be dimensionless (and hopefully

ranging over [0,1]). Moreover, for the sake of sta-

tistical signiﬁcance, all the error results shown in the

following have been averaged over 10 trials.

5.2 Test Results: Tuning of Parameters

We tried our approach (referred to as

AA-TP

here-

inafter) with different settings of its parameters, in-

cluding the base regression method (REGR) for in-

ducing the PPM of each discovered cluster. For the

sake of simplicity, we here only focus on the usage of

two basic regression methods: classic Linear regres-

sion (Draper and Smith, 1998), and the tree-based re-

gression algorithm RepTree (Witten and Frank, 2005).

In addition, we consider the case where each PPM

model simply encodes a k-NN regression procedure

(denoted by IB-k hereinafter), as a rough term of com-

parison with the family of instance-based regression

methods (including, in particular, the approach in (van

Dongen et al., 2008)). For all of the above regres-

sion methods, we reused the implementations avail-

able in the popular data-mining library Weka (Frank

et al., 2005). We remind that the other parameters

are: minSupp, (i.e., the minimum support for frequent

patterns), kTop (i.e., the maximal number of patterns

to keep, and then use in the clustering), and maxGap

(i.e., the maximal number of spurious events allowed

to occur among those of a given pattern, in a trace that

supports this latter). Figure 4 allows for analyzing the

three kinds of errors varies, respectively, when using

different regression methods (distinct curves are de-

picted for them), and different values of the parame-

ters (namely, maxGap = 0,4,8, ∞, kTop = 4,∞, and

minSupp = 0.1,... ,0.4).

Clearly, the underlying regression method is the

factor exhibiting a stronger impact on precision re-

sults. In particular, the disadvantage of using linear

regression is neat (no matter of the error metrics),

whereas both IB-k and RepTree methods performs

quite well, and very similarly. This is good news, es-

pecially for the RepTree method, which is to be pre-

ferred to IB-k for scalability reasons. Indeed, this lat-

ter may end up being too time-consuming at run-time,

when a large set of example traces must be kept – even

though, differently from pure instance-based methods

(like (van Dongen et al., 2008)), we do not need to

search across the whole log, but only within a single

cluster (previously selected, based on context data).

As the to remaining parameters, it is easily seen

that poorer results are obtained when minSupp = 0.1

and kTop = 4, as well as when minSupp = 0.4. As

a matter of fact, the former case epitomizes the cases

where we cut little (according to frequency) during

the generation of patterns, while trying to reduce their

number in the ﬁltering phase; the latter, instead, is an

opposite situation where a rather high threshold sup-

port threshold is employed, at a higher risk of loos-

ing important pieces of information on process be-

haviour. In more detail, in the former case, the nega-

tive outcome is alleviated when setting maxGap = 0,

i.e., the patterns are required to exactly match a seg-

ment (i.e., subsequence of contiguous elements) in

their supporting traces. It is worth noticing that, apart

from these extreme cases,

AA-TP

exhibits good stabil-

ity and robustness, over a wide range of parameter set-

tings. Remarkably, when minSupp gets a value from

[0.2,... ,0.3], the remaining two parameters, namely

kTop and maxGap, do not seem to affect the quality

of predictions at all. In practice, it sufﬁces to choose

carefully the regression method (and a middling value

of minSupp) to ensure good and stable prediction out-

comes, no matter of the other parameters – which

would be, indeed, quite harder to tune in general.

5.3 Comparison with Competitors

Let us ﬁnally compare our approach with two other

ones, deﬁned in the literature for the discovery of a

PPM:

CA-TP

(Folino et al., 2012) and

FSM

(van der

Aalst et al., 2011). Tables 1 and 2 reports the average

errors (and associated standard deviations) made by

system

AA-TP

, while varying and the base regression

method (namely, Linear, RepTree and IB-k). In par-

ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems

ticular, the ﬁrst table regard the case when bag repre-

sentations are used for abstracting traces, whereas the

second corresponds to the usage of set abstractions.

These values were computed by averaging the ones

obtained with different settings of the parameters min-

Supp, kTop, and maxGap. Similarly, for both of

the approaches

CA-TP

and

FSM

, we computed the av-

erage of the results obtained using different values

of the history horizon parameter h (precisely, h =

1,2,4, 8,16), and the best-performing setting for all

the remaining parameters — which are of minor in-

terest here, since we mainly want to contrast our ab-

straction strategy to the classical ones of competitors.

Interestingly, the ﬁgures in Tables 1 and 2 indicate

that our approach is more accurate than both competi-

tors, irrespective of the abstraction strategy adopted.

It is worth noticing that the best results (shown in bold

in the tables), for all the error metrics, are obtained

when

AA-TP

is used with the bag abstraction mode.

Indeed, when combining this abstraction mode with

the IB-k regressor,

AA-TP

manages to lower the pre-

diction error by about 55.9% on average w.r.t.

CA-TP

and by an astonishing 74.1% w.r.t.

FSM

, on average

(w.r.t. all the error metrics). Similar results are ob-

tained when using RepTree (still with bag abstrac-

tions), where a reduction of 50.7% (resp., 70.6%) is

achieved w.r.t. to

CA-TP

(resp.,

FSM

6 CONCLUSIONS

We have presented a new predictive process-mining

approach, which fully exploits context information,

and determines the right level of abstraction on

log traces in data-driven way. Combining several

data mining and data transformation methods, the

approach allows for recognizing different context-

dependent process variants, while equipping each of

them with a separate regression model.

Encouraging results were obtained on a real appli-

cation scenario, showing that the method is precise

and robust, and it yet requires little human interven-

tion. Indeed, it sufﬁces not to use extreme values for

the support threshold to have low prediction errors,

no matter of the other parameters (i.e., maxGap and

kTop) — which would, indeed, harder to tune.

As future work, we plan to explore the usage of

sequence-like patterns (e.g., k-order subsequences)

— possibly combined with those already considered

here — in order to capture the structure of a process

instance in a more precise (but still abstract enough)

manner, as well as to fully integrate our approach into

a real Business Process Management platform, in or-

der to offer advantage run-time services.

REFERENCES

Agrawal, R. and Srikant, R. (1994). Fast algorithms for

mining association rules in large databases. In Proc. of

20th Int. Conf. on Very Large Data Bases (VLDB’94),

pages 487–499.

Blockeel, H. and Raedt, L. D. (1998). Top-down induction

of ﬁrst-order logical decision trees. Artiﬁcial Intelli-

gence, 101(1-2):285–297.

DLAI Group (1998). CLUS: A predictive clustering sys-

tem. Available at http://dtai.cs.kuleuven.be/clus/.

Draper, N. R. and Smith, H. (1998). Applied Regression

Analysis. Wiley Series in Probability and Statistics.

Folino, F., Guarascio, M., and Pontieri, L. (2012). Discover-

ing context-aware models for predicting business pro-

cess performances. In Proc. of 20th Int. Conf. on

Cooperative Information Systems (CoopIS’12), pages

287–304.

Frank, E., Hall, M. A., Holmes, G., Kirkby, R., and

Pfahringer, B. (2005). Weka - a machine learning

workbench for data mining. In The Data Mining and

Knowledge Discovery Handbook, pages 1305–1314.

Hardle, W. and Mammen, E. (1993). Comparing nonpara-

metric versus parametric regression ﬁts. The Annals

of Statistics, 21(4):1926–1947.

Harlde, W. (1990). Applied NonParametric Regression.

Cambridge University Press.

Quinlan, R. J. (1992). Learning with continuous classes. In

In Proc. of 5th Australian Joint Conference on Artiﬁ-

cial Intelligence (AI’92), pages 343–348.

van der Aalst, W. M. P. and et al. (2007). ProM 4.0: Com-

prehensive support for real process analysis. In Proc.

of 28th Int. Conf. on Applications and Theory of Petri

Nets and Other Models of Concurrency (ICATPN’07),

pages 484–494.

van der Aalst, W. M. P., Schonenberg, M. H., and Song,

M. (2011). Time prediction based on process mining.

Information Systems, 36(2):450–475.

van der Aalst, W. M. P., van Dongen, B. F., Herbst,

J., Maruster, L., Schimm, G., and Weijters, A. J.

M. M. (2003). Workﬂow mining: a survey of is-

sues and approaches. Data & Knowledge Engineer-

ing, 47(2):237–267.

van Dongen, B. F., Crooy, R. A., and van der Aalst, W.

M. P. (2008). Cycle time prediction: When will this

case ﬁnally be ﬁnished? In Proc. of 16th Int. Conf. on

Cooperative Information Systems (CoopIS’08), pages

319–336.

Witten, I. H. and Frank, E. (2005). Data Mining: Practi-

cal Machine Learning Tools and Techniques, Second

Edition. Morgan Kaufmann Publishers Inc.

AData-adaptiveTraceAbstractionApproachtothePredictionofBusinessProcessPerformances