DISCOVERING LARGE SCALE MANUFACTURING PROCESS
MODELS FROM TIMED DATA
Application to STMicroelectronics’ Production Processes
Pamela Viale
1,2
, Nabil Benayadi
1
, Marc Le Goc
1
and Jacques Pinaton
2
1
LSIS, Laboratory for Information and System Sciences,University of Marseille, Marseille, France
2
STMicroelectronics, Rousset, France
Keywords:
Process model discovery, Temporal knowledge discovering, Markov processes, Sequence alignment.
Abstract:
Modeling manufacturing process of complex products like electronic chips is crucial to maximize the quality
of the production. The Process Mining methods developed since a decade aims at modeling such manufactur-
ing process from the timed messages contained in the database of the supervision system of this process. Such
process can be complex making difficult to apply the usual Process Mining algorithms. This paper proposes
to apply the TOM4L Approach to model large scale manufacturing processes. A series of timed messages is
considered as a sequence of class occurrences and is represented with a Markov chain from which models are
deduced with an abductive reasoning. Because sequences can be very long, a notion of process phase based
on a concept of class of equivalence is defined to cut the sequences so that a model of a phase can be locally
produced. The model of the whole manufacturing process is then obtained from the concatenation of the mod-
els of the different phases. This paper presents the application of this method to model STMicroelectronics’
manufacturing processes. STMicroelectronics’ interest in modeling its manufacturing processes is based on
the necessity to detect the discrepancies between the real processes and experts’ definitions of them.
1 INTRODUCTION
Modeling manufacturing processes for the construc-
tion of complex objects like electronic chips is cru-
cial for maximizing productivity. The methods and
the algorithms developed in the Process Mining do-
main aims at modeling such manufacturing processes
with graphs of manufacturing steps (treatments, oper-
ations or tasks) from the timed messages contained in
the database (Cook and Wolf, 1998). The proposed al-
gorithms generally generate complex models that are
difficult to read and interpret when the process con-
tains hundred of steps.
This paper proposes a modeling method and the
corresponding algorithms to model manufacturing
processes having hundreds of steps. The proposed
method is based on cutting the sequences in subse-
quences called process phases.
The section 2 recalls briefly the main approaches
of the Process Mining area and the main character-
istics of the TOM4L framework. Section 3 defines
This work has been made possible thanks to the finan-
cial support of STMicroelectronics, Convention 642/2008.
the notions of class of equivalence and process phase
that we propose and describes the algorithms that are
required to this aim. Section 4 presents the applica-
tion of the algorithms to model STMicroelectronics’
manufacturing processes. These manufacturing pro-
cesses are used for creating electronics chips, micro-
controllers, etc. The resulting models will be useful
to control that the implemented process correspond to
the experts’ ideas. Discrepancies found will be help-
ful to alarm experts’ about possible problems in pro-
duction. In section 5 a critical evaluation is presented,
showing the importance of the equivalent class con-
cept. In section 6 a summary of the proposed method
is presented. The paper concludes in section 7 with
the introduction of our current works.
2 RELATED WORKS
In the Process Mining framework, a series of mes-
sages is considered as an ordered set of events from
where a process model can be inferred and repre-
sented in some formalism (workflows, state charts or
227
Viale P., Benayadi N., Le Goc M. and Pinaton J. (2010).
DISCOVERING LARGE SCALE MANUFACTURING PROCESS MODELS FROM TIMED DATA - Application to STMicroelectronics’ Production
Processes.
In Proceedings of the 5th International Conference on Software and Data Technologies, pages 227-235
DOI: 10.5220/0002930302270235
Copyright
c
SciTePress
Petri nets for examples) (van der Aalst and Weijters,
2004). One of the first algorithm was proposed in
(Agrawal et al., 1998). The algorithm aims at find-
ing workflow graphs from a set of series of events
contained in a workflow log. An event represents the
start time of a task. To avoid the problem of poten-
tial cycles (i.e. repeated events in a series), the algo-
rithm first renames the repeated labels of task before
enumerating the binary dependency relations between
the tasks. This set of relations is then reduced with the
use of the transitivity property of the binary relations.
Labels are again renamed to merge the tasks, making
possible the introduction of cycles in the model. Dif-
ficulties arise with this approach when (i) the tasks are
statistically independent and (ii) the number of tasks
is large (Agrawal et al., 1998). Nevertheless, Pinter
(Pinter and Golani, 2004) extends this algorithm no-
tably with the introduction of events marking the end
of the tasks. Similar issues in the context of software
engineering processes are investigated in (Cook and
Wolf, 1998) where the aim is to build a finite state
machine from the set of the most frequent event pat-
terns mined in a given log. In particular, the Markov
algorithm is based on a two order Markov chain that
is converted in states and state transitions. Cook and
Wolf (Cook and Wolf, 2004) extend this method to
concurrent processes and uses a first order Markov
chain for this aim. The difficulties come from the
pruning of the finite state machine to obtain a mini-
mal model and the sensibility of pruning metrics to
the ”noise” (van der Aalst and Weijters, 2004). Aalst
(van der Aalst et al., 2004) defines the class of process
that can be modeled with the α-algorithm but this al-
gorithm requires the series of events in the log to be
noise-free and complete.
There is a consensus to consider that finite state ma-
chines are difficult to understand and to validate. And
most of the proposed methods have difficulties when
(i) the process contains a lot of steps, (ii) the se-
ries in the log induce potential cycles in the models
and (iii) the sequences are not noise-free and com-
plete. The TOM4L Approach (Timed Observations
Mined for Learning, previously called Stochastic Ap-
proach Framework) (Le Goc et al., 2005) for discov-
ering temporal knowledge from timed observations
provides a general framework for modeling dynamic
processes that is based on a markovian representation
but uses abstract chronicle models (Ghallab, 1996) in-
stead of finite state machines. This framework consid-
ers that the timed messages of a series are written in
a database by a program, called a monitoring cogni-
tive agent MCA, that monitors a production process
Pr. A timed message is represented with an occur-
rence of a discrete event class C
i
= {e
i
} that is an
arbitrary set of discrete event e
i
= (x
i
, δ
i
), where δ
i
is one of the discrete value of the variable x
i
. When
the variable x
i
is not known, an abstract variable φ
i
is used to define the discrete event e
i
= (φ
i
, δ
i
) cor-
responding to the constant δ
i
. A discrete event class
is often a singleton because in that case, two discrete
event classes C
i
= {(x
i
, δ
i
)} and C
j
= {(x
j
, δ
j
)} are
only linked with the variables x
i
and x
j
when the con-
stants δ
i
and δ
j
are independent (Le Goc, 2006). This
condition is only concerned with the programs the
MCA is made with. A sequence of discrete event
class occurrences is then considered as the observ-
able manifestation of a series of state transitions in
a timed stochastic automaton representing the cou-
ple (Pr, MCA). The BJT4G algorithm represents a
set of sequences of discrete event class occurrences
with a one order Markov chain and uses an abduc-
tive reasoning to identify the set of the most proba-
ble timed sequential binary relations between discrete
event classes leading to a given class. A timed se-
quential binary relation R(C
i
,C
j
, [τ
i j
, τ
+
i j
]) is an ori-
ented relation between two discrete event classes C
i
and C
j
that is timed constrained with the interval
[τ
i j
, τ
+
i j
]. [τ
i, j
, τ
+
i, j
] is the time interval for observing an
occurrence of the C
j
class after an occurrence of the
C
i
class. The set of timed sequential binary relation
is an abstract chronicles model where the nodes are
discrete event classes and the links are timed sequen-
tial binary relations. This paper proposes to tackle the
two main problems of the Process Mining approaches
with the extension of the TOM4L Approach. The
first ideas of this approach has been presented in (Be-
nayadi et al., 2008).
3 EXTENSION OF THE TOM4L
APPROACH FOR PROCESS
MINING
3.1 Motivation
Let us take an example to illustrate the proposed ex-
tensions with a manufacturing process having a set
S = {A, B,C, D, E,F} of 6 manufacturing steps. Sup-
pose the supervision system records the execution of a
step with a message X (t
k
) denoting the time t
k
of the
beginning of the execution of the step X. The three
series of messages of table 1 are represented with the
abstract chronicle model of figure 1. In this model, if
nodes labeled with A denote the same manufacturing
step, then the nodes can be confused, introducing a
cycle in the model. The same reasoning can be done
over the other nodes, making the model difficult to
ICSOFT 2010 - 5th International Conference on Software and Data Technologies
228
Table 1: Three series of event.
A(t
1
)B(t
2
)F(t
3
)B(t
4
)E(t
5
)D(t
6
)C(t
7
)A(t
8
)E(t
9
)B(t
10
)
A(t
11
)B(t
12
)E(t
13
)C(t
14
)D(t
15
)A(t
16
)E(t
17
)B(t
18
)
A(t
19
)B(t
20
)E(t
21
)C(t
22
)D(t
23
)A(t
24
)E(t
25
)B(t
26
)
Figure 1: Model for the three Sequences.
read and to understand.
This repetition of manufacturing steps can be due
to a situation that arrives frequently during the exe-
cution of a manufacturing process. When an object
is being manufactured, the object goes from one ma-
chine to another for making all the necessary treat-
ments on it. Nevertheless, usual controls in pro-
duction sometimes show that certain objects did not
achieved the expected characteristics. Some actions
(special actions) has to be done to correct the prob-
lem. These actions usually consist on performing spe-
cial treatments and/or the repetition of tasks.
This is the reason why we need to align the differ-
ent process execution sequences to identify equivalent
treatments over the different objects being produced
in each execution.
Given a set = {ω
i
}
i={1,...,n}
of sequences ob-
tained from the execution of the same manufacturing
process, we need a ’good’ sequence alignment be-
tween them, SA() . By ’good’ we mean an align-
ment that aligns similar treatments made on the dif-
ferent objects on the same column and special tasks
(task performed to correct errors) with gaps (’-’).
Two possible ’good’ alignment for the sequences
shown in Table 1 are proposed in Table 2.
Suppose now that each of the three sequences are
cut when a label is aligned with exactly the same one
in all sequences and this alignment appears two times.
The model of the first part of the sequences will be
similar to the one of figure 1 but without the path
A E B at the end of the model. This fact moti-
vates the notion of process phase proposed in this pa-
per. But this notion is not sufficient to solve the cycles
that are introduced with the steps C and D. We con-
sider that this problem is due to the fact that the three
series of events provide no information about the or-
der of the steps C and D. Consequently, any solution
of this problem must take into account some a priori
knowledge about the process which we want to avoid
it. The notion of potential cycle is then defined to de-
tect this kind of situation to be able to make further
investigations (i.e. finding new series or discussing
with experts for examples).
To illustrate the notion of class of equivalence, let
us take the Edit activity of the ”writing a scientific pa-
per” process that can be made by different students
and professors. The Edit activities can then be la-
beled differently according to the performer with a set
of classes of the form: C = {C
E1
= {(s
1
, δ
1
)},C
E2
=
{(s
2
, δ
2
)}, . . .}, where the variables s
i
and p
i
denotes
respectively students and professors. In this case,
the resulting model of the process will be complex
without necessity. One of the interesting features of
the TOM4L Approach is the notion of discrete event
class. This notion can be used to define abstract
classes of the form C
φ
i
= {(φ
i
, δ
1
), . . . , (φ
i
, δ
n
)} where
φ
i
denotes an abstract variable and the set {δ
j
}, j =
1. . . n, is an arbitrary set of constants. This property
allows defining classes of equivalence that simplifies
a process model. For example, an abstract class C
φ
i
be defined as an equivalent class of the set of classes
C of the ”writing a scientific paper” process. Doing
this way allows constituting a set of sequences com-
ing from different students and professors.
3.2 Class of Equivalence
By definition, a large scale process consist on a lot
of steps. Some of these steps differs only with some
characteristics but realizes similar treatments.
Definition 1. Given a model M =
{R(C
i
,C
j
, [τ
i j
, τ
+
i j
])} build with a set = {ω
i
}
of discrete event class occurrences ω
i
, a class
C
φ
i
= {(φ
i
, δ
1
), (φ
i
, δ
2
), . . . , (φ
i
, δ
n
)} is an equiv-
alence class of a sub set of classes C = {C
j
},
j = 1. . . n, C
j
= {(x
j
, δ
j
)}, of the set of classes C
M
of
a model M iff:
C
a
,C
b
C
C
k
C
M
,
(R(C
k
,C
a
, [τ
ka
, τ
+
ka
]) M R(C
k
,C
b
, [τ
kb
, τ
+
kb
]) M)
(1)
C
r
C
M
,
(R(C
a
,C
r
, [τ
ar
, τ
+
ar
]) M R(C
b
,C
r
, [τ
br
, τ
+
br
]) M)
This definition means that every classes C
j
of
the subset C C
M
are linked with the same classes
in M. When this condition is verified, each oc-
currences C
j
(t
s
) of the classes C
j
in the sequences
ω
i
of can be rewritten as occurrences C
φ
i
(t
s
)
of the equivalence class C
φ
i
. The abstract vari-
able φ
i
has no a priori meaning: φ
i
can be substi-
tuted with the corresponding concrete variable x
i
in
DISCOVERING LARGE SCALE MANUFACTURING PROCESS MODELS FROM TIMED DATA - Application to
STMicroelectronics' Production Processes
229
Table 2: Two Sequence Alignments proposed for sequences on Table 1.
Possible Sequence Alignments
SA
1
=
A(t
1
) B(t
2
) F(t
3
) B(t
4
) E(t
5
) D(t
6
) C(t
7
) A(t
8
) E(t
9
) B(t
10
)
A(t
11
) B(t
12
) E(t
13
) C(t
14
) D(t
15
) A(t
16
) E(t
17
) B(t
18
)
A(t
19
) B(t
20
) E(t
21
) C(t
22
) D(t
23
) A(t
24
) E(t
25
) B(t
26
)
SA
2
=
A(t
1
) B(t
2
) F(t
3
) B(t
4
) E(t
5
) D(t
6
) C(t
7
) A(t
8
) E(t
9
) B(t
10
)
A(t
11
) B(t
12
) E(t
13
) C(t
14
) D(t
15
) A(t
16
) E(t
17
) B(t
18
)
A(t
19
) B(t
20
) E(t
21
) C(t
22
) D(t
23
) A(t
24
) E(t
25
) B(t
26
)
any occurrences C
φ
i
(t
s
). Consequently, the set of
uphill relations {R(C
i
,C
j
, [τ
i j
, τ
+
i j
])} of M will be-
come {R(C
i
,C
φ
i
, [τ
iφ
i
, τ
+
iφ
i
])} and the set of down-
hill relations {R(C
j
,C
k
, [τ
jk
, τ
+
jk
])} of M will become
{R(C
φ
i
,C
k
, [τ
φ
i
k
, τ
+
φ
i
k
])}. In practice, an equivalence
class can be used to represent related discrete event
classes. In the application presented in the next sec-
tion, a discrete event class represents a set of treat-
ments that can be made on an object using a particular
machine. Equivalence classes are then used to repre-
sent this set of treatments made by different machines:
in that case, the machines are equivalents because the
same set of treatments can be done on each of the ma-
chines.
Given a set of sequences = {ω
i
}
i={1,...,n}
, the
algorithm for defining equivalence classes find all the
equivalence classes and rewrite the corresponding oc-
currences in each sequence ω
i
(Algorithm 1):
1. Build a model M given a set of sequences =
{ω
i
}
i={1,...,n}
2. Find all the subset of classes C verifying the equa-
tion 1
3. For all the subsets C
Create an equivalence class C
φ
i
.
For all C
j
C, rewrite all the occurrences C
j
(t
s
)
in all the sequences ω
i
with the rewriting
rule: C
j
(t
s
) C
φ
i
(t
s
).
4. Build a new model M
0
with .
3.3 Process Phase
The information contained in a series of manufactur-
ing messages is concerned with both the state of the
manufactured object and the manufacturing process
that makes evolving this state from an initial state up
to a final state. But generally, the state of the manufac-
tured object is not provided with the messages. So we
propose to capture indirectly this dimension with the
notion of Process Phase. However, before introduc-
ing this new concept, we must define formally what is
called a Sequence Alignment.
3.3.1 Sequence Alignment
A sequence alignment consists in a way of arranging
sequences to identify regions of similarity between
them. Aligned sequences are usually represented
as rows within a matrix M = [m
i, j
]
i∈{1,...,n}, j∈{1,...,r}
.
Gaps (’-’) are inserted between the residues so that
identical or similar residues are aligned in successive
columns. In our problem, the residues are the discret
event class occurrences.
Definition 2. Given a model M =
{R(C
i
,C
j
, [τ
i j
, τ
+
i j
])} build with a set = {ω
i
}
of discrete event class occurrences. A sequence
alignment SA between all sequences ω
i
is
defined as follows:
SA() = {M
n×r
|
ω
j
, r card(ω
j
)
m
i j
M
n×r
, m
i j
C
M
{
0
0
}
m
i j
, m
i( j+1)
M
n×r
, (2)
(m
i j
= C
j
(t
k
),C
j
(t
k
) ω
i
m
i( j+1)
= C
l
(t
x
),C
l
(t
x
) ω
i
t
x
> t
k
)}
where:
- card(s) is the length of sequence s.
There are many possible alignments between n
sequences. There exists many algorithms for creat-
ing sequence alignments based on different objectives
and strategies (Thompson et al., 1994; Notredame
et al., 2000; Needleman and Wunsch, 1970; Smith
and Waterman, 1981).
Given a set of sequences = {ω
i
}
i={1,...,n}
and a
sequence alignment SA(), we propose an algorithm
for renaming classes.This algorithm traverses all se-
quences and renames a class if in the alignment it is
aligned with classes different from itself or with gaps
(Algorithm 2):
ω
i
, C
j
(t
k
) ω
i
do:
Rename the occurrence C
j
(t
k
) when:
- C
i
(t
s
) w
m
, w
m
,C
j
(t
k
) is aligned with
C
i
(t
s
) in SA() and C
i
6= C
j
; or
ICSOFT 2010 - 5th International Conference on Software and Data Technologies
230
Table 3: Series of event after renaming with Algorithm 2.
A(t
1
) B(t
2
) F
1,3
(t
3
) B
1,4
(t
4
) E(t
5
) D
1,6
(t
6
) C(t
7
) A(t
8
) E(t
9
) B(t
10
)
A(t
11
) B(t
12
) E(t
13
) C(t
14
) D
2,15
(t
15
) A(t
16
) E(t
17
) B(t
18
)
A(t
19
) B(t
20
) E(t
21
) C(t
22
) D
3,23
(t
23
) A(t
24
) E(t
25
) B(t
26
)
- C
j
(t
k
) is aligned with a gap (’-’) in SA()
In Table 3 we will show the corresponding series
from Table 1 after applying Algorithm 2 on them,
considering sequence alignment SA
1
.
3.3.2 Process Phase Concept
As it was said before, the concept of process phase
will be used to capturate the different stages that an
object must go through in the manufacturing process
to achieve the final expected state. For this task we
need to count with a ’good’ sequence alignment of the
sequences that we use to build the model. The reason
is that this alignment provides a guideline to distin-
guish normal treatments from special tasks performed
in particular executions of the process. Aligned tasks
are supposed to be regular tasks whereas not aligned
tasks are not.
Definition 3. Given a set of sequences
= {ω
i
}
i={1,...,n}
and a sequence alignment
SA(), a process phase of the model M con-
structed using Algorithm 2 is a submodel
M
0
= {R(C
i
,C
j
, [τ
i j
, τ
+
i j
])|C
i
,C
j
C
M
0
} M, so
that there is no path P = {R(C
i
,C
i+1
,
[τ
ii+1
, τ
+
ii+1
])} M
0
, i = 1. . . n, where:
j, k N, 1 j < k n,C
j
= C
k
(3)
Algorithm 3 aims at cutting a set =
{ω
i
}
i={1,...,n}
of sequences in subsequences ω
i
k
that
respects the equation 3 (i.e. ω
i
k
does not contain two
occurrences of the same class):
1. ω
i
do
Remove ω
i
from .
Cut up ω
i
in a set
i
= {ω
i
k
} of sub sequences
ω
i
k
verifying the equation 3.
2. ω
i
k
i
do
Add an occurrence of the C
0
and C
1
classes at
the beginning and the end of ω
i
k
3. =
S
i
.
An occurrence of an abstract start class C
0
and
an occurrence of an abstract final class C
1
are
added at the beginning and the end of each sub
sequences ω
i
k
so that the BJT 4G algorithm auto-
matically identifies the process phases. For exam-
ple, when applying the algorithm 3 on the three
sequences of Table 1, the BJT 4G algorithm will
find two process phases. The second process phase
will be: { R(C
0
,C
A
, [τ
0A
, τ
+
0A
]) , R(C
A
,C
E
, [τ
AE
, τ
+
AE
]),
R(C
E
,C
B
, [τ
EB
, τ
+
EB
]), R(C
B
,C
1
, [τ
B1
, τ
+
B1
]) }.
It can be easily deduced from the definition of pro-
cess phase that process phases are dependent on the
sequence alignment chosen. It is necessary to count
with a ’good’ sequence alignment for being able to
correctly identify the different stages in the process.
3.4 Potential Cycles
When looking the model of figure 1, it is clear
that the classes C and D introduce a cycle. Cycles
present a strong problem of interpretation, making
hard to understand the resulting models. This explains
why there is a lot of works aiming at avoiding cy-
cles in process models (cf. (Cook and Wolf, 2004),
(Schimm, 2004) (van der Aalst et al., 2004), (Pin-
ter and Golani, 2004), (Weijters and van der Aalst,
2003) or (Agrawal et al., 1998) for examples). But
these works make assumptions about the process or
impose constraints about the constitution of the se-
quences. In all the case, this consists in having some
a priori knowledge about the process to be modeled
or the set of programs that write the messages in the
process data base.
The aim of the TOM4L Approach is to provide
models of sequences without any a priori knowledge
about the process and the set of programs that have
generated the occurrences. One difficulty is that cy-
cles often appear when mining a process because of
the transitivity property of the sequential binary rela-
tions.
Property 1. The timed sequential binary relations
R(C
i
,C
j
, [τ
i j
, τ
+
i j
]) of a given abstract chronicle model
M = {R(C
i
,C
j
, [τ
i j
, τ
+
i j
])} are transitives.
R(C
i
,C
j
, [τ
i j
, τ
+
i j
]) M R(C
j
,C
k
, [τ
jk
, τ
+
jk
]) M
R(C
i
,C
j
, [τ
i j
, τ
+
i j
]) R(C
j
,C
k
, [τ
j,k
, τ
+
j,k
])
R
T
(C
i
,C
k
, [τ
ik
, τ
+
ik
]) (4)
Definition 4. Given a process model M, two discrete
event classes C
i
and C
j
are not ordered when:
M ` R
T
(C
i
,C
j
, [τ
i j
, τ
+
i j
]) M ` R
T
(C
j
,C
i
, [τ
ji
, τ
+
ji
]) (5)
Two classes C
i
and C
j
that can not be ordered in a
model will be denoted C
i
kC
j
C
j
kC
i
.
DISCOVERING LARGE SCALE MANUFACTURING PROCESS MODELS FROM TIMED DATA - Application to
STMicroelectronics' Production Processes
231
For examples the three sequences of the Table 1
do not provide any order between the classes C
C
and
C
D
(Figure 1). Consequently: C
C
kC
D
C
C
kC
D
.
Property 2. The set of discrete event classes C
k
=
{C
1
,C
2
, . . . ,C
n
} can not be ordered when:
C
i
C
k
, C
j
C
k
,C
i
kC
j
. (6)
In the theory of graphs, the classes of a set C
k
are strongly connected components. The algorithm
4 aims at detecting a potential cycle (i.e. a set C
k
is
defined):
1. Build a process model M from with the BJT 4G
algorithm.
2. Build the set Ck = {C
k
i
} of the sets C
k
i
of classes
without order with the equation 5
3. C
k
i
Ck do
Remove the relations R(C
i
,C
j
, [τ
i j
, τ
+
i j
]) of M
where C
i
C
k
i
or C
j
C
k
i
.
Generate all the pathes P = {P
k
} with P
k
=
{R(C
i
,C
i+1
, [τ
ii+1
, τ
+
ii+1
])} where C
i
C
k
i
and
C
i+1
C
k
i
Insert the relations of the pathes P in M
To avoid the adding of a priori knowledge about the
process or the programs, the algorithm 4 computes
all the paths linking the classes in C
k
(cf. model of
Figure 1 with the classes C and D). It is clear that if
Card(C
k
) = n, there is n! possible paths. But it is a
simple way to put the emphasis on potential cycles.
3.5 Modeling a Large Scale Process
The algorithm 5 aims at modeling a large scale man-
ufacturing process. It simply uses the four algorithms
provided in the preceding subsections. Given a set
of sequences = {ω
i
}
i={1,...,n}
and a sequence align-
ment SA(), the algorithm 5 finds a process model M
with the BJT 4G algorithm:
1. Rewrite the sequences of with the algorithm 1.
2. Rewrite the sequences of with the algorithm 2
using information from SA().
3. Produce the sets
k
of subsequences ω
i
k
with the
algorithm 3
4.
k
do
Build a process model M
k
of the phase k with
the algorithm 4.
5. M =
S
M
k
Applied to the sequences of the Table 1, this al-
gorithm provides the model of the Figure 1. This al-
gorithm has also been used to model the wafer manu-
facturing process of the Rousset (France) plant of the
STMicroelectronics company.
4 APPLICATION
The aim of STMicroelectronics Company is to im-
prove the control of the wafer manufacturing pro-
cesses through the definition of human scale process
models and a better knowledge of the timed con-
straints between the different steps of manufacturing.
A ”wafer” is a silicon plate used for the construc-
tion of STMicroelectronics’ products. A wafer manu-
facturing process is a series of elementary treatments
called ”recipes” that are executed on machines called
”equipments”. An ”operation” is a particular serie
of recipes and each recipe is associated with differ-
ent equipments, those that are qualified for perfom-
ing the treatment. A complete series of operations is
called ”manufacturing route”. The STMicroelectron-
ics’ plant situated in the south of France, in Rousset,
counts with more than 10.000 different recipes, each
manufacturing route is composed of about 400 oper-
ations and there are actually more than 300 equip-
ments in the plant. The supervision system of the
wafer manufacturing process describes a manufac-
turing route with messages providing the name of a
recipe, the machine on which the recipe is executed,
the corresponding operation and the start and finish
times of the recipe.
For validating the approach presented in this
paper, a set containing 5 sequences ω
i
, i =
{1, . . . , 5}, has been extracted from STMicroelectron-
ics’ databases. These sequences contain data from
real production of the Company. For the construc-
tion of this example we considered an extract of each
of these 5 sequences ω
i
, i = {1, . . . , 5}. Only the 200
first executed recipes were used.
The Algorithm 5 has been applied at the equip-
ment level so that a process model M represents a
manufacturing route by a serie of equipments. Even
though we worked with extracts of real routes, we
found a large volume of data. In the example con-
structed, 127 discret event classes have been found.
A class is defined with a singleton C
i
= {(φ
i
, i)}
where the constant i is a natural number in the inter-
val [1000, . . . , 1126]. Each of these classes represent
a particular equipment in STMicroelectronics’ plant.
We will illustrate the application of the approach
proposed in this paper showing how it works over two
subsequences of sequences ω
1
and ω
2
used for the
ICSOFT 2010 - 5th International Conference on Software and Data Technologies
232
example construction. These two subsequences can
be seen in Figure 2.
Figure 2: Subsequence of ω
1
(up) and subsequence of ω
2
(down).
During the example construction, 67 differ-
ent equivalent classes have been found. The
equivalent classes created using Algorithm 1
are singletons of the form C
j
= {(φ
j
, j)} with
j [6000, 6066]. Consider the equivalent class
C
6008
= {(φ
6008
, 6008)} contains the discret
event classes {C
1030
,C
1031
,C
1032
,C
1033
} and that
{C
1010
,C
1011
,C
1012
} C
6009
, {C
1050
,C
1051
,C
1052
}
C
6031
, {C
1040
,C
1041
} C
6040
, {C
1020
} C
6042
,
{C
1060
,C
1061
} C
6046
,{C
1080
,C
1081
} C
6056
,
{C
1070
,C
1071
} C
6065
. The algorithm 1 rewrites
the two subsequences of Figure 2 and produces the
subsequences of Figure 3.
Figure 3: Subsequences of ω
1
and ω
2
rewritten with Algo-
rithm 1.
A hand made sequence alignment has been used.
The alignment corresponding to the two subse-
quences is shown in Figure 4.
Figure 4: Subsequences of ω
1
and ω
2
aligned.
After applying Algorithm 2 for renaming classes
according to the alignment shown in Figure 4, the sub-
sequences of ω
1
and ω
2
are rewritten as it is shown in
Figure 5.
Figure 5: Subsequences of ω
1
and ω
2
rewritten using Algo-
rithm 2.
Algorithm 3 finds two process phases (i and i +
1) for the subsequence considered of ω
1
and for the
subsequence considered of ω
2
. Process phase i can
be seen in Figure 6 and process phase i + 1 is shown
in Figure 7.
Figure 6: ω
i
1
(up) and ω
i
2
(down).
Figure 7: ω
i+1
1
(up) and ω
i+1
2
(down).
Using Algorithm 4 over the phases found, the cor-
responding models were constructed. Following the
two subsequences used for illustrating the approach,
we construct the models shown in Figures 8 and 9.
Figure 8: Model for phase i obtained from ω
i
1
and ω
i
2
.
Figure 9: Model for phase i + 1 obtained from ω
i+1
1
and
ω
i+1
2
.
During the construction of the hole example
model, 19 process phases were found. The resulting
model contains a total of 302 nodes.
In Figure 10 the resulting model constructed for
the 5th phase found is shown. The volume of data in
this small example is large even though we worked
with extracts of manufacturing routes. The model ob-
tained using this method is, however, simple and easy
to understand. The resulting model has already been
validated by STMicroelectronics’ experts.
The next step is to build a model of for com-
plete sequences and for bigger sets (more than 5 se-
quences). Afterwards, it would be valuable to con-
struct models for other levels of granularity (level of
operations or recipe level are some options).
DISCOVERING LARGE SCALE MANUFACTURING PROCESS MODELS FROM TIMED DATA - Application to
STMicroelectronics' Production Processes
233
Figure 10: Phase 5 of the model.
5 CRITICAL
EVALUATION / COMPARISON
The original idea of our method was born due to the
necessity of a general algorithm that could be capa-
ble to reduce the number of nodes when modeling
large scale manufacturing processes. The idea was
then to find equivalences between different data. The
result of this idea was the introduction of the equiv-
alent class concept. These equivalent classes would
have different interpretations depending on the level
of granularity considered.
Most of the algorithms in the bibliography are
good for creating models knowing the different ac-
tivities that constitute a process. This would corre-
spond to information in the level of recipes in STMi-
croelectronics’ manufacturing processes. However,
if we consider only information about the machines
used and the time of execution of each recipe as we
did in section 4, the existant algorithms in the process
mining area will model with a high level of redun-
dancy.
For illustrating the problem, we applied the meth-
ods proposed in (Agrawal et al., 1998) and (Cook and
Wolf, 1998) over the subsequences shown in Figure
2. The resulting models are shown in Figure 11. Af-
terwards, we compared the two models obtained with
the model constructed using our approach in section
4 (concatenation of submodels of Figures 8 and 9).
In Figure 11 each node represents a machine that
executes a certain recipe. In fact, for the level of ab-
straction considered in the example, the methods pro-
posed in (Agrawal et al., 1998) and (Cook and Wolf,
1998) tend to construct all possible paths between all
different machines capable of executing the different
recipes in the manufacturing processes. By contrast,
nodes in the model constructed using our approach
are associated to each of the equivalent classes found.
Each equivalent class represent a group of machines
capable of executing a same subset of recipes. The
(a) Model constructed using (Agrawal et al., 1998) approach
(b) Model constructed using (Cook and Wolf, 1998) approach
Figure 11: Models constructed for subsequences of Figure
2.
model constructed using our approach is then smaller
than the ones obtained from other methods for this
level of abstraction. Besides, the resulting model
helps the experts concentrate on the task developed
in each step and not on the machine used for that aim.
Another drawback to use the approach proposed
in (Agrawal et al., 1998) to model STMicroelectron-
ics processes is due to the usage of process graphs.
The edges in this type of models represent control
conditions and the vertices represent activities. Any
activity appears only once in a process graph as a ver-
tex label. STMicroelectronics’ manufacturing routes
definitions involve the repetition of several tasks.
However, these repetitions are controlled and the re-
execution of tasks is only possible for certain states
of the manufactured objects and not for others. If we
try to use process graphs for modeling these manu-
facturing processes, the identification of states of the
manufactured objects result difficult if not impossible.
6 CONCLUSIONS
This paper presents the TOM4L Approach for mod-
eling manufacturing processes from timed data con-
tained in a supervision system database. One of the
ICSOFT 2010 - 5th International Conference on Software and Data Technologies
234
interesting features of this approach is the notion of
discrete event class. This notion is used to define a
process phase concept and discrete event classes of
equivalence that are required for large scale manu-
facturing processes. The definition of these concepts
leads to a global algorithm that has been applied to
modeling STMicroelectronics’ (Rousset) manufactur-
ing processes. This concrete application shows the
operational flavor of the extensions of the TOM4L
Approach Framework. The construction of these
models will be a valuable tool for STMicroelectron-
ics to control production and to alarm experts when
the real activity of the Company does not follow their
theoretical definitions.
7 CURRENT WORKS
Current works are devoted to finding an adaptation of
sequences alignment algorithms used in the genetic
field (Notredame et al., 2000). The objective is to find
an analogy of the work presented in (Gauthier et al.,
2008), works done on the social science field. Our
idea is to find a way to calculate the similarity val-
ues between different events classes in any kind of se-
quence, without introducing experts knowledge about
their contents.
REFERENCES
Agrawal, R., Gunopulos, D., and Leymann, F. (1998). Min-
ing process models from workflow logs. In Sixth In-
ternational Conference on Extending Database Tech-
nology, pages 469–483.
Benayadi, N., Le Goc, M., and Bouch
´
e, P. (2008). Using the
stochastic approach framework to model large scale
manufacturing processes. Proceedings of the 3rd In-
ternational Conference on Software and Data Tech-
nologies (ICSoft 2008).
Cook, E. J. and Wolf, A. L. (1998). Discovering models
of software processes from event-based data. ACM
Transactions on Software Engineering and Methodol-
ogy, 7:215–249.
Cook, E. J. and Wolf, A. L. (2004). Event-based detec-
tion of concurrency. In Proceedings of the 6th ACM
SIGSOFT international symposium on Foundations of
software engineering, volume 53, pages 35–45.
Gauthier, J. A., Widmer, E. D., Bucher, P., and Notredame,
C. (2008). How much does it cost? optimization of
costs in sequence analysis of social science data. So-
ciological Methods and Research.
Ghallab, M. (1996). On chronicles: Representation, on-line
recognition and learning. Proc. Principles of Knowl-
edge Representation and Reasoning, Aiello, Doyle
and Shapiro (Eds.) Morgan-Kauffman, pages 597–
606.
Le Goc, M. (2006). Notion d’observation pour le diagnostic
des processus dynamiques: Application
`
a Sachem et
`
a la d
´
ecouverte de connaissances temporelles. Hdr,
Facult
´
e des Sciences et Techniques de Saint J
´
er
ˆ
ome.
Le Goc, M., Bouch
´
e, P., and Giambiasi, N. (2005). Stochas-
tic modeling of continuous time discrete event se-
quence for diagnosis. 16th International Workshop on
Principles of Diagnosis (DX’05) , California, USA.
Needleman, S. B. and Wunsch, C. D. (1970). A general
method applicable to the search of similarities in the
amino acid sequence of two proteins. J. Mol. Biol.,
48:443–453.
Notredame, C., Higgins, D. G., and Heringa, J. (2000). T-
coffee: A novel method for fast and accurate multiple
sequence alignment. J. Mol. Biol., 302:205–217.
Pinter, S. and Golani, M. (2004). Discovering workflow
models from activities’ lifespans. In Special issue:
Process/workflow mining, volume 53, pages 283–296.
Schimm, G. (2004). Mining exact models of concurrent
workflows. In Computers in Industry, volume 53(3),
pages 265–281.
Smith, T. F. and Waterman, M. S. (1981). Identification
of common molecular subsequences. J. Mol. Biol.,
147:195–197.
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994).
Clustal w: improving the sensitivity of progres-
sive multiple sequence alignment through sequence
weighting, position-specific gap penalties and weight
matrix choice. Nucleic Acid Research, 22:4673–4680.
van der Aalst, W., Weijters, T., and Maruster, L. (2004).
Workflow mining: Discovering process models from
event logs. In IEEE Transactions on Knowledge and
Data Engineering, volume 16, pages 1128–1142.
van der Aalst, W. M. P. and Weijters, A. J. M. M. (2004).
Process mining. Special issue of Computers in Indus-
try, 53:231–244.
Weijters, A. and van der Aalst, W. (2003). Rediscovering
workflow models from event-based data using little
thumb. In Integrated Computer-Aided Engineering,
volume 10(2), pages 151–162.
DISCOVERING LARGE SCALE MANUFACTURING PROCESS MODELS FROM TIMED DATA - Application to
STMicroelectronics' Production Processes
235