DISCO

VERING MULTI-PERSPECTIVE PROCESS MODELS

Francesco Folino

, Gianluigi Greco

, Antonella Guzzo

and Luigi Pontieri

ICAR-CNR, via P. Bucci 41C, I87036 Rende, Italy

Dept. of Mathematics, UNICAL, Via P. Bucci 30B, I87036 Rende, Italy

DEIS, UNICAL, Via P. Bucci 41C, I87036 Rende, Italy

Keywords:

Business Process Intelligence, Process Mining, Decision Trees.

Abstract:

Process Mining techniques exploit the information stored in the executions log of a process in order to extract

some high-level process model, which can be used for both analysis and design tasks. Most of these techniques

focus on “structural” (control-ﬂow oriented) aspects of the process, in that they only consider what elementary

activities were executed and in which ordering. In this way, any other “non-structural” information, usually

kept in real log systems (e.g., activity executors, parameter values, and time-stamps), is completely disre-

garded, yet being a potential source of knowledge. In this paper, we overcome this limitation by proposing

a novel approach for discovering process models, where the behavior of a process is characterized from both

structural and non-structural viewpoints. In a nutshell, different variants of the process (classes) are recognized

through a structural clustering approach, and represented with a collection of speciﬁc workﬂow models. Rel-

evant correlations between these classes and non-structural properties are made explicit through a rule-based

classiﬁcation model, which can be exploited for both explanation and prediction purposes. Results on real-

life application scenario evidence that the discovered models are often very accurate and capture important

knowledge on the process behavior.

1 INTRODUCTION

Process mining techniques have been getting increas-

ing attention in the Business Process Management

community because of their ability of characterizing

and analyzing process behaviors, which turns out to

be particularly useful in the (re)-design of complex

systems. In fact, these techniques are aimed at ex-

tracting a model for a process, based on the data gath-

ered during its past enactments and stored in suitable

log ﬁles by workﬂow (or similar kinds of transac-

tional) systems. In particular, traditional and consol-

idated approaches —see, e.g., (van der Aalst et al.,

2003) for a survey on this topic— are focused on

“structural” aspects of the process, i.e., they try to

single out mutual dependencies among the activities

involved in the process, in terms of relationships of

precedence and/or in terms of various routing con-

structs (such as parallelism, synchronization, exclu-

sive choice, and loops).

Only very recently, process mining techniques have

been proposed to deal with important “non-structural”

aspects of the process, such as activity executors, pa-

rameter values, and performance data. For instance,

decision trees have been used in (Ly et al., 2005)

to express staff assignment rules, which correlate

agent proﬁles with the execution of process activities;

while, a decision point analysis has been discussed

in (Rozinat and van der Aalst, 2006), where decision

trees are used to determine, at each split point of the

model, the activities that are most likely to be exe-

cuted next. Actually, despite the improvements w.r.t.

classical approaches, these techniques still exploit a

few simplifying assumptions that limit their applica-

bility in some application contexts. In particular, the

various approaches share the basic idea of using non-

structural data to characterize which activities are to

be executed, by completely disregarding the speciﬁc

coordination mechanisms of their enactment, i.e., how

the ﬂow of execution is inﬂuenced by certain data val-

ues.

As an example, in a sales order process where

different sub-processes are enacted depending on

whether the customer is a novel one or has, instead,

some ﬁdelity card, current approaches will hardly dis-

cover that two different usage scenarios occur and that

they can moreover be discriminated by some speciﬁc

data ﬁelds associated with the customer.

The aim of this paper is precisely to enhance cur-

rent techniques with the capability of characterizing

Folino F., Greco G., Guzzo A. and Pontieri L. (2008).

DISCOVERING MULTI-PERSPECTIVE PROCESS MODELS.

In Proceedings of the Tenth International Conference on Enterprise Information Systems - AIDSS, pages 70-77

DOI: 10.5220/0001705700700077

 SciTePress

how activity executors, parameter values and perfor-

mance data affect routing constructs and, more gener-

ally, how they determine the sub-processes to be exe-

cuted. To this end, we shall investigate on the mining

of a multi-perspective process model, where structural

aspects and non-structural ones are formally related

with each other. In fact, the model consists of:

• a (structure-centric) behavioral schema, where the

various sub-processes occurring in the process be-

ing analyzed are explicitly and independently de-

scribed; and,

• a (data-dependent) classiﬁcation model, assessing

in terms of decision rules over process instance

attributes when the various sub-processes have to

be enacted.

From a technical point of view, on the one hand the

behavioral schema will be discovered by resorting to

the structural clustering approach presented in (Greco

et al., 2006), where log instances are partitioned into

a number of clusters according to the sequence of

tasks performed, and where each cluster is eventu-

ally equipped with a speciﬁc workﬂow schema. On

the other hand, the classiﬁcation model will be built

by means of a rule-based classiﬁer, using the clusters

previously found as class labels for all their associated

traces.

The whole approach has been implemented as a

plug-in for the ProM framework (van Dongen et al.,

2005), a powerful platform for the analysis of process

logs, and has been tested over a challenging real-life

application scenario, regarding the trading of contain-

ers in an Italian harbor. Results of experimentation

evidence that very accurate models can be found with

our technique, which moreover provides useful hints

for the tuning of the involved logistic operations.

Organization. The remainder of the paper is orga-

nized as follows. After introducing some notation

and basic concepts, Section 2 discusses the proposed

approach in a detailed way, and reports some basic

implementation issues about the integration within

ProM. Then, the real-life case study and results of ex-

perimental activity are discussed in Section 3. Finally,

a few concluding remarks are reported in Section 4.

2 MULTI-PERSPECTIVE

PROCESS MODELS

Process logs contain a wide range of information

about process executions. Following a standard ap-

proach in the literature, we next shall adopt a simple

representation of process logs, where each trace stor-

ing a single enactment of the process (named, process

instance or case) is just viewed as a sequence of task

identiﬁers, while additional data are represented via

attributes associated with traces.

Let T be a set of task labels, and A = {A

,...,A

}

be a set of (names of) process attributes, taking values

from the domains D

,...,D

, respectively. Then, a

trace (over T and A) is a sequence s = [s

,...,s

] of

task labels in T , associated with a tuple data(s) of

values from D

,...,D

, i.e., data(s) ∈ D

× ...× D

A process log (or simply log) over T and A is a set of

traces over T and A.

By inspecting and analyzing a process log, our ap-

proach aims at extracting a high-level representation

of the (possibly unknown) underlying process, which

we call multi-perspective process model. Formally, a

multi-perspective process model for a log L is quadru-

ple M = hC

,λ,δ

i such that:

• C

= {C

,...C

} is a partition of the traces in L,

i.e.,

i=1..q

0 and

i=1..q

= L;

• W

= {W

,...W

} is a set of workﬂow

schemas, one for each cluster in C

, where all

tasks are named with labels from T ;

• λ is bijective function mapping the clusters C

to their corresponding workﬂow schemas in W

i.e., λ : C

→ W

, where λ(C

) is the schema

modeling cluster C

;

• δ

: D

× ... × D

→ C

is q-ary classiﬁcation

function, which discriminates among the clusters

in C

based on the values of A’s attributes.

In words, W

is a modular description of the be-

havior registered in the log, as far as concerns struc-

tural aspects only, where each execution cluster is rep-

resented with a separate workﬂow schema, describing

the typical way tasks are executed in that cluster. Con-

versely, δ

is a classiﬁcation function that correlates

these structural clusters with (non-structural) trace at-

tributes, by mapping any n-ple of values for the at-

tributes A

, ..., A

into a single cluster of C

For notation convenience, in the rest of the pa-

per the above model will be seen as two sub-models,

which represent the process in different and yet com-

plementary ways: a structural model hC

,λi,

which focuses on the different execution ﬂows across

the process tasks, and a data-aware classiﬁcation

model, which is essentially encoded in the function

in the form of a decision tree (Buntine, 1992). In

fact, yet being strictly connected between each other,

both models are relevant in themselves, for they take

care of separate aspects of the process.

DISCOVERING MULTI-PERSPECTIVE PROCESS MODELS

2.1 Algorithmic Issues

Input: A log L over tasks T and attributes A.

Output: A multi-perspective process model for L.

Method: Perform the following steps:

1 C

structuralClustering

(L);

2 δ

induceClassifier

);

3 C

classifyTraces

(L,δ);

4 for each C

∈ C

5 W

WFDiscovery

);

6 W

:= W

− {λ(C

)} ∪ {W

};

7 λ(C

) := {W

};

8 end

9 return hC

,λ,δ

Figure 1: Algorithm

ProcessDiscovery.

As discussed in the introduction, the problem of dis-

covering process models based on structural informa-

tion has been widely investigated in the literature. In

Figure 1 an algorithm (named

ProcessDiscovery

) is

reported that addresses instead the more general prob-

lem of discovering a high-level process representa-

tion, by computing a multi-perspective process model

for a given log L.

To this aim, the recursive clustering scheme

recently proposed in (Greco et al., 2006)

is ﬁrst exploited as a sub-routine (function

structuralClustering

) to recognize behav-

iorally homogenous groups of traces. Actually, log

traces are partitioned by applying the well-known

k-means method (Mitchell, 1997) to a vectorial

representation of the traces, obtained by using a

special kind of sequential patterns over the tasks

—further details can be found in (Greco et al., 2006).

Afterwards, the algorithm takes advantage of all

available data attributes by invocating the func-

tion

induceClassifier

, which derives a classiﬁca-

tion function (see Step 2), expressing the mapping

from data attributes to the clusters discovered previ-

ously. Note that, by using the clusters produced by

structuralCLustering

as class labels for all their

associated traces, we may learn a classiﬁcation model

over the attributes in A. Indeed, a wide variety of

models and algorithms might be exploited to this end.

Among them, we selected the popular formalism of

decision trees, mainly for the following two reasons.

First, decision trees can easily be understood and do

not require any prior assumptions on data distribution;

and, second, a number of algorithms are available that

compute them in an efﬁcient way, which cope quite

well with noise and overﬁtting as well as with miss-

ing values.

The remainder of the

ProcessDiscovery

algo-

rithm is meant to reﬁne the structural model com-

puted in the ﬁrst phase, by making it more consis-

tent with the classiﬁcation function δ just discovered.

Indeed, all the log traces are ﬁrst remapped to the

clusters, in Step 3, by using the model in a predic-

tive way. A new workﬂow model is then discovered

for each cluster, in Steps 4-8, which should model

more precisely the structure of the traces that the

cluster has been made contain—note that the func-

tion

mineWF

basically stands for any standard pro-

cess mining algorithm in the literature, such as those

discussed in the survey (van der Aalst et al., 2003).

Eventually, the algorithm concludes by returning a

multi-perspective process model integrating the dif-

ferent kinds of knowledge discovered.

2.2 Implementation Issues

The above approach has been implemented into a pro-

totype system, which has been integrated as a plug-

in within the ProM framework (van Dongen et al.,

2005), a powerful platform for the analysis of process

logs, quite popular in the Process Mining community.

The logical architecture of the system is sketched

Figure 2, where solid arrow lines stand for informa-

tion exchange. Note that the whole mining process is

driven by the module Process Mining Handler, while

the other modules roughly replicate the computation

scheme in Figure 1. By Log Repository we denote

a collection of existing process logs, represented in

MXML (van Dongen and van der Aalst, 2005), a for-

mat shared by many process mining tools (including,

in particular, the ProM framework).

The Structural Mining module, together with the

ones connected with it, is responsible for the con-

struction of structural models. In particular, the dis-

covery of different trace clusters is carried out by the

module Structural Clustering, which exploits some

functionalities of the plug-in DWS implementing the

clustering scheme in (Greco et al., 2006). The mod-

ule WFDiscovery is used, instead, to derive a work-

ﬂow schema for each discovered cluster, by exploit-

ing the ProM implementation of the Heuristic miner

algorithm (Weijters and van der Aalst, 2003).

Discovered clusters and schemas are stored in the

Trace Clusters repository and in the Workﬂow Repos-

itory, respectively, for inspecting and further analysis.

The Training Set Builder module mainly labels

each trace with the name of the cluster it has been

assigned to, as it is registered in the Trace Clusters

repository, in order to provide the Classiﬁer induction

module with a training set for learning a classiﬁcation

model. The module Classiﬁer applies such a model

ICEIS 2008 - International Conference on Enterprise Information Systems

Log

Repository

UserInterface

ProcessMining

Handler

tructural

Mining

Trace

Clusters

Discovered

orkflowSchemas

FDiscovery

Structural

Clustering

parameters

Workflow

Repository

Classification

Mining

Classifier

nduction

Labeled

traces

Classification

model

Repository

lassifier

Log

Handler

Logrequest

Tracefetching

Fetched Traces

Training Set

Builder

LogExtractionRequest

Clusteredtraces

Inducted

Classificationmodel

Classifierparameters

Clusterremapping

Workflow

Schemas

Classifiedtraces

Classifier

Evaluator

Conformance

Evaluator

Model

Evaluation

Figure 2: System architecture.

to reassign the original log traces to the clusters. Both

these latter modules, coordinated by the module Clas-

siﬁcation Mining, have been implemented by using

the Weka library (Frank et al., 2005) and, in particular,

the J48 implementation of the well known algorithm

C4.5 (Quinlan, 1993). In this regard, we notice that

the Training Set Builder also acts as a “translator” by

encoding the labelled log traces into a tabular form,

according to the ARFF format used in Weka. Inci-

dentally, this format requires that proper types (such

as nominal, numeric, string) are speciﬁed for all at-

tributes. Since this information misses in the origi-

nal (MXML) log ﬁles, a semi-automatic procedure is

used to assign such a type to each attribute.

Two modules help the user in evaluating the qual-

ity of the discovered models: the Classiﬁer Evaluator,

which computes standard performance indexes for the

classiﬁcation models, and the Conformance Evalua-

tor module which measures what an extent workﬂow

models represents the behaviors actually registered in

the log traces they have been derived from. To this

end, some metrics have been implemented, which will

be brieﬂy described in Section 3.2.

Figure 3 shows two screenshots of the plug-in.

Speciﬁcally, Figure 3(a) reports the input panel,

which allows for setting all the parameters needed in

our approach, whereas Figure 3(b) illustrates the out-

come of a mining session. For the latter, note, in par-

ticular, the decision tree (on the left) and one of the

workﬂow schemas (on the right) found by the tool.

3 CASE STUDY

This section is devoted to discuss the application of

the proposed approach within a real-life scenario,

concerning an Italian maritime container terminal, as

a way to give some evidence for the validity of our

proposal, as well as to better explain its behavior

through a practical example. In Section 3.1, we ﬁrst

illustrate the application scenario, by discussing the

kind of data involved in it. Then, in Section 3.2, we

introduce the setting adopted for evaluating the qual-

ity of discovered models, while in Section 3.3, we

eventually discuss results of experiments carried out

on the scenario.

3.1 Application Scenario

A series of logistic activities are registered for each

of the containers that pass through the harbor, which

actually amount to nearly 4 millions per year. Mas-

sive volumes of data are hence generated continually,

which can proﬁtably be exploited to analyze and im-

prove the enactment of logistics processes. For the

sake of simplicity, in the remainder of the paper we

only consider the handling of containers which both

arrive and depart by sea, and focus on the different

kinds of moves they undergo over the “yard”, i.e.,

the main area used in the harbor for storage purposes.

This area is logically partitioned into a ﬁnite number

of bi-dimensional slots, which are the units of storage

space used for containers, and are organized in a ﬁxed

number of sectors.

The life cycle of any container is roughly sum-

DISCOVERING MULTI-PERSPECTIVE PROCESS MODELS

(a) Parameter setting

(b) Result view

Figure 3: Screenshots of the plug-in within ProM.

marized as follows. The container is unloaded from

the ship and temporarily placed near to the dock, un-

til it is carried to some suitable yard slot for being

stocked. Symmetrically, at boarding time, the con-

tainer is ﬁrst placed in a yard area close to the dock,

and then loaded on the cargo. Different kinds of vehi-

cles can be used for moving a container, including,

e.g., cranes, straddle-carriers (a vehicle capable of

picking and carrying a container, by possibly lifting

it up), and multi-trailers (a sort of train-like vehicle

that can transport many containers).

This basic life cycle may be extended with ad-

ditional transfers—classiﬁed as “house-keeping”—

which are meant to make the container approach its

ﬁnal embark point or to leave room for other contain-

ers. More precisely, the following basic operations

are registered for any container c:

•

MOV

, when c is moved from a yard position to an-

other by a straddle carrier;

•

DRB

, when c is moved from a yard position to an-

other by a multi-trailer;

•

DRG

, when a multi-trailer moves to get c;

•

LOAD

, when c is charged on a multi-trailer;

•

DIS

, when c is discharged off a multi-trailer;

•

SHF

, when c is moved upward or downward, pos-

sibly to switch its position with another container.

•

OUT

, when a dock crane embarks c on a ship.

Critical performance aspects in this scenario are

the latency time elapsed when serving a ship (where,

typically, a number of containers are both discharged

off and charged on), and the overall costs of moving

the containers around the yard. Both these measures

are heavily affected by the number of extra, “house-

keeping”, moves applied to the containers. Reducing

such operations is a major goal of the allocation poli-

cies, established to decide where to place a newly ar-

rived container, based on different features of it, such

as, e.g., the kind of conveyed goods, its origin and

next destination, as well as some future events (e.g.

which ships are going to take the container and when

this is going to happen). Unfortunately, this informa-

tion can partly be missing or even incorrect, while di-

verse types of delays or malfunctioning are likely to

occur. Therefore, some ex-post analysis of yard op-

eration logs via data mining and process mining tech-

nique is expected to provide precious feedback to the

planning of logistic processes, and to help in decision

making tasks.

3.2 Evaluation Setting

In order to provide a quantitative evaluation of the

multi-perspective process models mined with our ap-

proach, two aspects have been measured in the ex-

perimentation: (i) the predictive power of the classiﬁ-

cation model in discriminating the structural clusters

found, and (ii) the level of conformance of each work-

ﬂow schema w.r.t. the clusters that it models.

As for point (i), a number of well-known measures

are available in the literature to evaluate a classiﬁca-

tion model. Here, we simply resort to the Accuracy

measure, which roughly expresses the percentage of

correct predictions that the classiﬁer would make over

all possible traces of the process. In particular, we es-

timate this measure via the popular cross-validation

method (with ten folders) (Mitchell, 1997). In addi-

tion, standard Precision and Recall measures will be

used to provide a “local” evaluation of the classiﬁer,

w.r.t. each single cluster; and, for each cluster, the

F measure will be reported as well, which is deﬁned

as F

= ((β

+ 1)P

× R

)/(P

+ β

)—note that for

β = 1, it coincides with the harmonic mean of the pre-

cision and recall values.

As for point (ii), the conformance of a workﬂow

model W w.r.t. a set L of traces can be evaluated

through the following metrics, all deﬁned in (Rozi-

ICEIS 2008 - International Conference on Enterprise Information Systems

Table 1: Summary of results obtained against real log data.

Test Clusters Structural model Data-aware classiﬁcation model

Fitness BehAppr StrAppr Accuracy Tree Size Top-4 Discriminant Attributes

Test A

2 0.8725 0.9024 1 96.01% 69 PrevHarbor, NavLine OUT, ContHeight, ShipType IN

Test A-bis

4 0.9324 0.9351 1 94.78% 77 PrevHarbor, NavLine OUT, ContHeight, ShipType IN

Test B

5 0.8558 0.9140 1 91.64% 105 ShipSize IN, NavLine OUT, PrevHarbor, ContHeight

nat and van der Aalst, 2008) and ranging over the real

interval [0,1]:

• Fitness, which essentially evaluates the ability of

W to parse all the traces L, by indicating how

much the events in L comply with W .

• Advanced Behavioral Appropriateness (denoted

by BehAppr, for short), which estimates the

level of ﬂexibility allowed in W (i.e., alterna-

tive/parallel behavior) really used to produce L.

• Advanced Structural Appropriateness (or

StrAppr, for short), which assesses the capability

of W to describe L in a maximally concise way.

These measures have been deﬁned for a workﬂow

schema and do not apply directly to the multi-schema

structural model discovered by our approach. In order

to show a single overall score for such a model, we

simply average the values computed by each of these

measures against all of its workﬂow schemas. More

precisely, the conformance values of these schemas

are added up in a weighted way, where the weight of

each schema is the fraction of original log traces that

constitute the cluster it was mined from.

3.3 Experimental Results

For all the experiments described next, we selected

only a subset of containers that completed their en-

tire life cycle in the hub along the ﬁrst two months of

year 2006, and which were exchanged with four given

ports around the Mediterranean sea—this yielded

about 50Mb log data concerning 5336 containers. In

order to apply our analysis approach, the transit of any

container through the hub has been regarded as a sin-

gle enactment case of a (unknown) logistic process,

for which a suitable model is to be discovered.

We next discuss three tests, which were carried out

on these data according to different perspectives:

• Test A (“operation-centric”), where we focus on

the sequence of basic logistic operations applied

to the containers. In more detail, for each con-

tainer a distinct log trace is built that records the

sequence of basic operations (i.e.,

MOV

DRB

DRG

LOAD

DIS

SHF

OUT

) it underwent.

• Test A-bis, where we still consider the sequence

of operations performed on each container, while

distinguishing different cost-dependent variants

for each of them. In fact, each operation is an-

notated with a sufﬁx denoting the cost spent for

performing it.

• Test B (“position-centric”), where we focus on

the ﬂow of containers across the yard. Speciﬁ-

cally, original data are transformed into a set of

log trace, each of them encoding the sequence of

yard sectors occupied by a single container during

its stay.

In all cases, two dummy activities (denoted by

START

and

END

, respectively) are used to univocally

mark the beginning and the end of each log trace.

Further, various data attributes have been considered

for each container (i.e., for each process instance), in-

cluding, e.g., its origin and ﬁnal destination ports, its

previous and next calls, diverse characteristics of the

ship that unloaded it, its physical features (e.g., size,

weight), and a series of categorical attributes concern-

ing its contents (e.g., the presence of dangerous or

perishable goods).

Table 1 summarizes a few relevant ﬁgures for these

tests, which concern both the clustering structure and

the two models associated with it. More speciﬁcally,

for each test, we report the number of clusters found

and the different conformance measures for the struc-

tural model (i.e., the set of workﬂow schemas), as

well as the accuracy and size of the decision tree.

Moreover, to give some intuition on the semantics

value of this latter model, we report four of the most

discriminant attributes, actually appearing in its top

levels

. In general, we note that surprisingly high ef-

fectiveness results have been achieved, as concerns

both the structural model and the classiﬁcation model.

Notably, such a precision in modelling both structural

and non-structural aspects of the logged events does

not come with a verbose (and possibly overﬁtting)

representation. Indeed, for all the tests, the number

of clusters and the size of the tree are quite restrained,

while the workﬂow models collectively attain a max-

imal score with the StrAppr metric.

In order to give more insight on the behavior of the

approach, we next illustrate some detailed results for

two of these tests (namely Test A and Test B).

Test A. In this case, the approach has discovered

Decision trees are not shown in detail for both space

and privacy reasons—many attribute express, indeed, sensi-

ble information about the hub society and its partners.

DISCOVERING MULTI-PERSPECTIVE PROCESS MODELS

(a) Cluster 0 (b) Cluster 1

Figure 4: Test A - the two workﬂow schemas discovered.

Table 2: Test A - details on the discovered clusters (sizes

and classiﬁcation metrics).

Cluster Size P R F (β = 1)

4736 97.28% 98.25% 97.76%

600 84.99% 78.33% 81.53%

two distinct usage scenarios, whose structural aspects

are described by the workﬂow schemas in Figure 4.

These schemas substantially differ for the presence of

operations performed with multi-trailer vehicles: the

schema of Figure 4.(a) does not feature any of these

operations, which are instead contained in the other

schema. Notably, the former schema captures the vast

majority of handling cases (4736 containers of the

original 5336 ones). This reﬂects a major aim of yard

allocation strategies: to keep each container as near

as possible to its positions of disembarkation/embark,

by performing short transfers via straddle-carriers.

Interestingly enough, these two markedly differ-

ent structural models appear to strongly depend—

an astonishing 96% accuracy score is achieved (cf.

Table 1)—on some features that go beyond the

mere occurrence of yard operations. Among these

features, the following container attributes stand

out: the provenance port (

PrevHarbor

) of a con-

tainer, the navigation line that is going to take

it away (

NavLine OUT

), the height of a container

(

ContHeight

), and the kind of ship that delivered it

to the hub (

ShipType IN

). A ﬁner grain analysis per-

formed with the help of Table 2 (where individual pre-

(a) Cluster 2

(b) Cluster 3

Figure 5: Test B - two of the workﬂow schemas found.

Table 3: Test B - details on the discovered clusters (sizes

and classiﬁcation metrics).

Cluster Size P R F (β = 1)

3664 93,76% 98,39% 96, 02%

188 66,67% 53,19% 59,17%

346 90,81% 74,28% 81,72%

1070 87,25% 80,56% 83, 77%

68 94,29% 97,06% 95, 65%

cision/recall measures for the two clusters are shown),

conﬁrms that the model guarantees a high rate of cor-

rect predictions for either cluster.

Test B. The “position-centric” approach is addressed

to analyze the different ways of displacing the con-

tainers around the yard, yet comparing the usage of

different storage areas. In principle, due to the high

number of sectors and moving patterns that come to

play in such analysis perspective, any ﬂat represen-

tation of container ﬂows, just consisting of a single

workﬂow schema, risks being either inaccurate or dif-

ﬁcult to interpret. Conversely, by separating different

behavioral classes our approach ensures a modular

representation, which can better support explorative

analyses. In fact, the ﬁve clusters found in this test

have been equipped with clear and compact workﬂow

schemas, which yet guarantee a high level of confor-

ICEIS 2008 - International Conference on Enterprise Information Systems

mance (see Table 1). As an instance, in Figure 5, we

report two of these schemas, which differ both in the

usage of sectors and in some of the paths followed by

the containers across these sectors.

Good quality results are achieved again both for the

structural model and for the decision tree. In actual

fact, by comparing these results with those obtained

in the other two tests, we notice some slight decrease

in the accuracy and a larger tree size, mainly due to

the higher level of complexity that distinguish the

position-centric analysis from the operation-centric

one. Incidentally, Table 3 reveals that such worsening

is mainly to blame on the inability of the decision tree

to recognize well the second cluster, which is, in fact,

slightly confused with the third one —further details

are omitted here for lack of space. Almost the same

attributes as in Test A have been employed to discrimi-

nate the clusters, except for the usage of

ShipSize IN

(i.e., the size category of the ship that delivered the

container) in place of

ShipType IN

4 CONCLUSIONS

In this paper we have proposed a novel process

mining approach which allows to discover multi-

perspective process models, i.e., process models

where structural and non-structural aspects are for-

mally related with each other. In a nutshell, the ap-

proach recognizes a number of homogeneous execu-

tion clusters (by way of a clustering method), while

providing each of them with a speciﬁc workﬂow

model. These classes of behavior are then correlated

with non-structural data (such as activity executors,

parameter values and performance metrics) by means

of a classiﬁcation model, which can be used to both

explain and predict them.

The whole approach has been implemented, inte-

grated as a plug-in in the ProM framework, and val-

idated on a real test case. Results of experimenta-

tion evidenced that the discovered classiﬁcation mod-

els provide a valuable help for interpreting and dis-

criminating different ways of executing the process,

because of their ability of making explicit the link be-

tween these variants and other process properties.

As future work, we will investigate the extension

of the proposed approach with outlier detection tech-

niques, in order to provide it with the capability of

spotting anomalous executions, which may mislead

the learning of process models, and could proﬁtably

be analyzed in a separate way. Moreover, we are ex-

ploring the integration of multi-perspective models in

an existing process management platform, as a basic

means for providing prediction and simulation fea-

tures, supporting both the design of new processes

and the enactment of future cases.

REFERENCES

Buntine, W. (1992). Learning classiﬁcation trees. Statistics

and Computation, 2:63–73.

Frank, E., Hall, M. A., Holmes, G., Kirkby, R., and

Pfahringer, B. (2005). Weka - a machine learning

workbench for data mining. In The Data Mining and

Knowledge Discovery Handbook, pages 1305–1314.

Greco, G., Guzzo, A., Pontieri, L., and Sacc

a, D. (2006).

Discovering expressive process models by clustering

log traces. IEEE Transactions on Knowledge and

Data Engineering, 18(8):1010–1027.

Ly, L. T., Rinderle, S., Dadam, P., and Reichert, M. (2005).

Mining staff assignment rules from event-based data.

In Business Process Management Workshops, pages

177–190.

Mitchell, T. (1997). Machine Learning. McGraw-Hill.

Quinlan, J. (1993). C4.5: Programs for Machine Learning.

Morgan Kaufmann.

Rozinat, A. and van der Aalst, W. M. P. (2006). Decision

mining in ProM. In Proc. of 4th Intl. Conf. on Business

Process Management (BPM’06), pages 420–425.

Rozinat, A. and van der Aalst, W. M. P. (2008). Confor-

mance checking of processes based on monitoring real

behavior. Information Systems, 33(1):64–95.

van der Aalst, W. M. P., van Dongen, B. F., Herbst, J.,

Maruster, L., Schimm, G., and Weijters, A. J. M. M.

(2003). Workﬂow mining: A survey of issues and ap-

proaches. Data Knowledge Engineering, 47(2):237–

267.

van Dongen, B. F., de Medeiros, A. K. A., Verbeek, H.

M. W., Weijters, A. J. M. M., and van der Aalst, W.

M. P. (2005). The ProM framework: A new era in

process mining tool support. In Proc. of 26th Interna-

tional Conference on Applications and Theory of Petri

Nets (ICATPN ’05), pages 444–454.

van Dongen, B. F. and van der Aalst, W. M. P. (2005).

A meta model for process mining data. In Proc. of

EMOI-INTEROP, pages 309–320.

Weijters, A. J. M. M. and van der Aalst, W. M. P. (2003).

Rediscovering workﬂow models from event-based

data using little thumb. Integrated Computer-Aided

Engineering, 10(2):151–162.

DISCOVERING MULTI-PERSPECTIVE PROCESS MODELS