which a process model is to be inferred and repre-
sented with a formalism (workflows, state charts or
Petri nets for examples) (van der Aalst and Weijters,
2004). One of the first algorithm was proposed in
(Agrawal et al., 1998). The algorithm aims at find-
ing workflow graphs from a set of series of events
contained in a workflow log. An event represents the
start time of a task. To avoid the problem of poten-
tial cycles (i.e. repeated events in a series), the algo-
rithm first renames the repeated labels of task before
enumerating the binary dependencyrelations between
the tasks. This set of relations is then reduced with the
use of the transitivity property of the binary relations.
Labels are again renamed to merge the tasks, making
possible the introduction of cycles in the model. Dif-
ficulties arise with this approach when (i) the tasks are
statistically independent and (ii) the number of tasks
is large (Agrawal et al., 1998). Nevertheless, Pinter
(Pinter and Golani, 2004) extends this algorithm no-
tably with the introduction of events marking the end
of the tasks. Similar issues in the context of software
engineering processes are investigated in (Cook and
Wolf, 1998) where the aim is to build a finite state
machine from the set of the most frequent event pat-
terns mined in a given log. In particular, the Markov
algorithm is based on a two order Markov chain that
is converted in states and state transitions. Cook and
Wolf (Cook and Wolf, 2004) extend this method to
concurrent processes and uses a first order Markov
chain to this aim. The difficulties come from the
pruning of the finite state machine to obtain a mini-
mal model and the sensibility of pruning metrics to
the ”noise” (van der Aalst and Weijters, 2004). Aalst
(van der Aalst et al., 2004) defines the class of process
that can be modeled with the α-algorithm but this al-
gorithm requires the series of events in the log to be
noise-free and complete.
There is a consensus to consider that finite state ma-
chines are difficult to understand and to validate. And
most of the proposed methods have difficulties when
(i) the process contains a lot of steps, (ii) the series in
the log induce potential cycles in the models and (iii)
the sequences are not noise-free and complete. The
Stochastic Approach framework (Le Goc et al., 2005)
for discovering temporal knowledge from timed ob-
servations provides a general frameworkfor modeling
dynamic processes that is based on a markovianrepre-
sentation but uses abstract chronicle models (Ghallab,
1996) instead of finite state machines. This frame-
work considers that the timed messages of a series are
written in a database by a program, called a monitor-
ing cognitive agent MCA, that monitors a production
process Pr. A timed message is represented with an
occurrence of a discrete event class C
i
= {e
i
} that is
an arbitrary set of discrete event e
i
= (x
i
, δ
i
), where δ
i
is one of the discrete value of the variable x
i
. When
the variable x
i
is not known, an abstract variable φ
i
is used to define the discrete event e
i
= (φ
i
, δ
i
) cor-
responding to the constant δ
i
. A discrete event class
is often a singleton because in that case, two discrete
event classes C
i
= {(x
i
, δ
i
)} and C
j
= {(x
j
, δ
j
)} are
only linked with the variables x
i
and x
j
when the con-
stants δ
i
and δ
j
are independent (Le Goc, 2006). This
condition is only concerned with the programs the
MCA is made with. A sequence of discrete event
class occurrences is then considered as the observ-
able manifestation of a series of state transitions in
a timed stochastic automaton representing the cou-
ple (Pr, MCA). The BJT4G algorithm represents a
set of sequences of discrete event class occurrences
with a one order Markov chain and uses an abduc-
tive reasoning to identify the set of the most proba-
ble timed sequential binary relations between discrete
event classes leading to a given class. A timed se-
quential binary relation R(C
i
,C
j
, [τ
−
ij
, τ
+
ij
]) is an ori-
ented relation between two discrete event classes C
i
and C
j
that is timed constrained with the interval
[τ
−
ij
, τ
+
ij
]. [τ
−
i, j
, τ
+
i, j
] is the time interval for observing
an occurrence of the C
j
class after an occurrence of
the C
i
class. The set of timed sequential binary rela-
tion is an abstract chronicles model that is graphically
represented with the ELP language (Event Language
for Process) where the nodes are discrete event classes
and the links are timed sequential binary relations. In
this paper, we propose to tackle the two main prob-
lems of the Process Mining approaches with the ex-
tension of the Stochastic Approach framework.
3 EXTENSION OF THE
STOCHASTIC APPROACH FOR
PROCESS MINING
3.1 Motivation
Let us take an example to illustrate the proposed ex-
tensions with a manufacturing process having a set
S = {A, B,C, D, E} of 5 manufacturing steps. Sup-
pose the supervision system records the execution of
a step with a message X(t
k
) denoting the time t
k
of the
beginning of the execution of the step X. The three
series of messages of table 1 is represented with the
abstract chronicle model of figure 1. In this model, if
the two nodes labeled with A denote the same manu-
facturing step, then the two nodes must be confused,
introducing a cycle in the model. The same reasoning
must be done with the other nodes, making the model
USING THE STOCHASTIC APPROACH FRAMEWORK TO MODEL LARGE SCALE MANUFACTURING
PROCESSES
187