2 STATE OF ART
2.1 Process Mining Area
In the Process Mining framework, a series of mes-
sages is considered as an ordered set of events from
where a process model can be inferred and repre-
sented in some formalism (workflows, state charts or
Petri nets for examples) (van der Aalst and Weijters,
2004). One of the first algorithm was proposed in
(Agrawal et al., 1998). The algorithm aims at find-
ing workflow graphs from a set of series of events
contained in a workflow log. An event represents the
start time of a task. To avoid the problem of poten-
tial cycles (i.e. repeated events in a series), the algo-
rithm first renames the repeated labels of task before
enumerating the binary dependency relations between
the tasks. This set of relations is then reduced with the
use of the transitivity property of the binary relations.
Labels are again renamed to merge the tasks, making
possible the introduction of cycles in the model. Dif-
ficulties arise with this approach when (i) the tasks are
statistically independent and (ii) the number of tasks
is large (Agrawal et al., 1998). Nevertheless, Pinter
(Pinter and Golani, 2004) extends this algorithm no-
tably with the introduction of events marking the end
of the tasks. Similar issues in the context of software
engineering processes are investigated in (Cook and
Wolf, 1998) where the aim is to build a finite state
machine from the set of the most frequent event pat-
terns mined in a given log. In particular, the Markov
algorithm is based on a two order Markov chain that
is converted in states and state transitions. Cook and
Wolf (Cook and Wolf, 2004) extend this method to
concurrent processes and uses a first order Markov
chain for this aim. The difficulties come from the
pruning of the finite state machine to obtain a mini-
mal model and the sensibility of pruning metrics to
the ”noise” (van der Aalst and Weijters, 2004). Aalst
(van der Aalst et al., 2004) defines the class of process
that can be modeled with the α-algorithm but this al-
gorithm requires the series of events in the log to be
noise-free and complete.
There is a consensus to consider that finite state ma-
chines are difficult to understand and to validate. And
most of the proposed methods have difficulties when
(i) the process contains a lot of steps, (ii) the se-
ries in the log induce potential cycles in the models
and (iii) the sequences are not noise-free and com-
plete. The TOM4L Approach (Timed Observations
Mined for Learning, previously called Stochastic Ap-
proach Framework) (Le Goc et al., 2005) for discov-
ering temporal knowledge from timed observations
provides a general framework for modeling dynamic
processes that is based on a markovian representation
but uses abstract chronicle models (Ghallab, 1996) in-
stead of finite state machines. This framework consid-
ers that the timed messages of a series are written in
a database by a program, called a monitoring cogni-
tive agent MCA, that monitors a production process
Pr. A timed message is represented with an occur-
rence of a discrete event class C
i
= {e
i
} that is an
arbitrary set of discrete event e
i
= (x
i
, δ
i
), where δ
i
is one of the discrete value of the variable x
i
. When
the variable x
i
is not known, an abstract variable φ
i
is used to define the discrete event e
i
= (φ
i
, δ
i
) cor-
responding to the constant δ
i
. A discrete event class
is often a singleton because in that case, two discrete
event classes C
i
= {(x
i
, δ
i
)} and C
j
= {(x
j
, δ
j
)} are
only linked with the variables x
i
and x
j
when the con-
stants δ
i
and δ
j
are independent (Le Goc, 2006). This
condition is only concerned with the programs the
MCA is made with. A sequence of discrete event
class occurrences is then considered as the observ-
able manifestation of a series of state transitions in
a timed stochastic automaton representing the cou-
ple (Pr, MCA). The BJT4G algorithm represents a
set of sequences of discrete event class occurrences
with a one order Markov chain and uses an abduc-
tive reasoning to identify the set of the most proba-
ble timed sequential binary relations between discrete
event classes leading to a given class. A timed se-
quential binary relation R(C
i
,C
j
, [τ
−
i j
, τ
+
i j
]) is an ori-
ented relation between two discrete event classes C
i
and C
j
that is timed constrained with the interval
[τ
−
i j
, τ
+
i j
]. [τ
−
i, j
, τ
+
i, j
] is the time interval for observing an
occurrence of the C
j
class after an occurrence of the
C
i
class. The set of timed sequential binary relation
is an abstract chronicles model where the nodes are
discrete event classes and the links are timed sequen-
tial binary relations. This paper proposes to tackle the
two main problems of the Process Mining approaches
with the extension of the TOM4L Approach. The
first ideas of this approach has been presented in (Be-
nayadi et al., 2008).
2.2 Sequence Alignment
Before introducing the extension to the TOM4L Ap-
proach, it is necessary to introduce some ideas about
sequence alignment. These ideas are necessary to
understand the steps proposed in the algorithms for
modeling large scale manufacturing processes.
A sequence alignment consists in a way of ar-
ranging sequences to identify regions of similarity
between them. Aligned sequences are usually rep-
resented as rows within a matrix. Gaps (’-’) are in-
serted between the residues so that identical or simi-
ICEIS 2010 - 12th International Conference on Enterprise Information Systems
130