DISCOVERING LARGE SCALE MANUFACTURING PROCESS

MODELS FROM TIMED DATA

Application to STMicroelectronics’ Production Processes

∗

Pamela Viale

1,2

, Nabil Benayadi

, Marc Le Goc

and Jacques Pinaton

LSIS, Laboratory for Information and System Sciences,University of Marseille, Marseille, France

STMicroelectronics, Rousset, France

Keywords:

Process model discovery, Temporal knowledge discovering, Markov processes, Sequence alignment.

Abstract:

Modeling manufacturing process of complex products like electronic chips is crucial to maximize the quality

of the production. The Process Mining methods developed since a decade aims at modeling such manufactur-

ing process from the timed messages contained in the database of the supervision system of this process. Such

process can be complex making difﬁcult to apply the usual Process Mining algorithms. This paper proposes

to apply the TOM4L Approach to model large scale manufacturing processes. A series of timed messages is

considered as a sequence of class occurrences and is represented with a Markov chain from which models are

deduced with an abductive reasoning. Because sequences can be very long, a notion of process phase based

on a concept of class of equivalence is deﬁned to cut the sequences so that a model of a phase can be locally

produced. The model of the whole manufacturing process is then obtained from the concatenation of the mod-

els of the different phases. This paper presents the application of this method to model STMicroelectronics’

manufacturing processes. STMicroelectronics’ interest in modeling its manufacturing processes is based on

the necessity to detect the discrepancies between the real processes and experts’ deﬁnitions of them.

1 INTRODUCTION

Modeling manufacturing processes for the construc-

tion of complex objects like electronic chips is cru-

cial for maximizing productivity. The methods and

the algorithms developed in the Process Mining do-

main aims at modeling such manufacturing processes

with graphs of manufacturing steps (treatments, oper-

ations or tasks) from the timed messages contained in

the database (Cook and Wolf, 1998). The proposed al-

gorithms generally generate complex models that are

difﬁcult to read and interpret when the process con-

tains hundred of steps.

This paper proposes a modeling method and the

corresponding algorithms to model manufacturing

processes having hundreds of steps. The proposed

method is based on cutting the sequences in subse-

quences called process phases.

The section 2 recalls brieﬂy the main approaches

of the Process Mining area and the main character-

istics of the TOM4L framework. Section 3 deﬁnes

∗

This work has been made possible thanks to the ﬁnan-

cial support of STMicroelectronics, Convention 642/2008.

the notions of class of equivalence and process phase

that we propose and describes the algorithms that are

required to this aim. Section 4 presents the applica-

tion of the algorithms to model STMicroelectronics’

manufacturing processes. These manufacturing pro-

cesses are used for creating electronics chips, micro-

controllers, etc. The resulting models will be useful

to control that the implemented process correspond to

the experts’ ideas. Discrepancies found will be help-

ful to alarm experts’ about possible problems in pro-

duction. In section 5 a critical evaluation is presented,

showing the importance of the equivalent class con-

cept. In section 6 a summary of the proposed method

is presented. The paper concludes in section 7 with

the introduction of our current works.

2 RELATED WORKS

In the Process Mining framework, a series of mes-

sages is considered as an ordered set of events from

where a process model can be inferred and repre-

sented in some formalism (workﬂows, state charts or

227

Viale P., Benayadi N., Le Goc M. and Pinaton J. (2010).

DISCOVERING LARGE SCALE MANUFACTURING PROCESS MODELS FROM TIMED DATA - Application to STMicroelectronics’ Production

Processes.

In Proceedings of the 5th International Conference on Software and Data Technologies, pages 227-235

DOI: 10.5220/0002930302270235

 SciTePress

Petri nets for examples) (van der Aalst and Weijters,

2004). One of the ﬁrst algorithm was proposed in

(Agrawal et al., 1998). The algorithm aims at ﬁnd-

ing workﬂow graphs from a set of series of events

contained in a workﬂow log. An event represents the

start time of a task. To avoid the problem of poten-

tial cycles (i.e. repeated events in a series), the algo-

rithm ﬁrst renames the repeated labels of task before

enumerating the binary dependency relations between

the tasks. This set of relations is then reduced with the

use of the transitivity property of the binary relations.

Labels are again renamed to merge the tasks, making

possible the introduction of cycles in the model. Dif-

ﬁculties arise with this approach when (i) the tasks are

statistically independent and (ii) the number of tasks

is large (Agrawal et al., 1998). Nevertheless, Pinter

(Pinter and Golani, 2004) extends this algorithm no-

tably with the introduction of events marking the end

of the tasks. Similar issues in the context of software

engineering processes are investigated in (Cook and

Wolf, 1998) where the aim is to build a ﬁnite state

machine from the set of the most frequent event pat-

terns mined in a given log. In particular, the Markov

algorithm is based on a two order Markov chain that

is converted in states and state transitions. Cook and

Wolf (Cook and Wolf, 2004) extend this method to

concurrent processes and uses a ﬁrst order Markov

chain for this aim. The difﬁculties come from the

pruning of the ﬁnite state machine to obtain a mini-

mal model and the sensibility of pruning metrics to

the ”noise” (van der Aalst and Weijters, 2004). Aalst

(van der Aalst et al., 2004) deﬁnes the class of process

that can be modeled with the α-algorithm but this al-

gorithm requires the series of events in the log to be

noise-free and complete.

There is a consensus to consider that ﬁnite state ma-

chines are difﬁcult to understand and to validate. And

most of the proposed methods have difﬁculties when

(i) the process contains a lot of steps, (ii) the se-

ries in the log induce potential cycles in the models

and (iii) the sequences are not noise-free and com-

plete. The TOM4L Approach (Timed Observations

Mined for Learning, previously called Stochastic Ap-

proach Framework) (Le Goc et al., 2005) for discov-

ering temporal knowledge from timed observations

provides a general framework for modeling dynamic

processes that is based on a markovian representation

but uses abstract chronicle models (Ghallab, 1996) in-

stead of ﬁnite state machines. This framework consid-

ers that the timed messages of a series are written in

a database by a program, called a monitoring cogni-

tive agent MCA, that monitors a production process

Pr. A timed message is represented with an occur-

rence of a discrete event class C

= {e

} that is an

arbitrary set of discrete event e

= (x

, δ

), where δ

is one of the discrete value of the variable x

. When

the variable x

is not known, an abstract variable φ

is used to deﬁne the discrete event e

= (φ

, δ

) cor-

responding to the constant δ

. A discrete event class

is often a singleton because in that case, two discrete

event classes C

= {(x

, δ

)} and C

= {(x

, δ

)} are

only linked with the variables x

and x

when the con-

stants δ

and δ

are independent (Le Goc, 2006). This

condition is only concerned with the programs the

MCA is made with. A sequence of discrete event

class occurrences is then considered as the observ-

able manifestation of a series of state transitions in

a timed stochastic automaton representing the cou-

ple (Pr, MCA). The BJT4G algorithm represents a

set of sequences of discrete event class occurrences

with a one order Markov chain and uses an abduc-

tive reasoning to identify the set of the most proba-

ble timed sequential binary relations between discrete

event classes leading to a given class. A timed se-

quential binary relation R(C

, [τ

−

i j

, τ

i j

]) is an ori-

ented relation between two discrete event classes C

and C

that is timed constrained with the interval

[τ

−

i j

, τ

i j

]. [τ

−

i, j

, τ

i, j

] is the time interval for observing an

occurrence of the C

class after an occurrence of the

class. The set of timed sequential binary relation

is an abstract chronicles model where the nodes are

discrete event classes and the links are timed sequen-

tial binary relations. This paper proposes to tackle the

two main problems of the Process Mining approaches

with the extension of the TOM4L Approach. The

ﬁrst ideas of this approach has been presented in (Be-

nayadi et al., 2008).

3 EXTENSION OF THE TOM4L

APPROACH FOR PROCESS

MINING

3.1 Motivation

Let us take an example to illustrate the proposed ex-

tensions with a manufacturing process having a set

S = {A, B,C, D, E,F} of 6 manufacturing steps. Sup-

pose the supervision system records the execution of a

step with a message X (t

) denoting the time t

of the

beginning of the execution of the step X. The three

series of messages of table 1 are represented with the

abstract chronicle model of ﬁgure 1. In this model, if

nodes labeled with A denote the same manufacturing

step, then the nodes can be confused, introducing a

cycle in the model. The same reasoning can be done

over the other nodes, making the model difﬁcult to

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

228

Table 1: Three series of event.

A(t

)B(t

)F(t

)B(t

)E(t

)D(t

)C(t

)A(t

)E(t

)B(t

)

A(t

)B(t

)E(t

)C(t

)D(t

)A(t

)E(t

)B(t

)

A(t

)B(t

)E(t

)C(t

)D(t

)A(t

)E(t

)B(t

)

Figure 1: Model for the three Sequences.

read and to understand.

This repetition of manufacturing steps can be due

to a situation that arrives frequently during the exe-

cution of a manufacturing process. When an object

is being manufactured, the object goes from one ma-

chine to another for making all the necessary treat-

ments on it. Nevertheless, usual controls in pro-

duction sometimes show that certain objects did not

achieved the expected characteristics. Some actions

(special actions) has to be done to correct the prob-

lem. These actions usually consist on performing spe-

cial treatments and/or the repetition of tasks.

This is the reason why we need to align the differ-

ent process execution sequences to identify equivalent

treatments over the different objects being produced

in each execution.

Given a set Ω = {ω

}

i={1,...,n}

of sequences ob-

tained from the execution of the same manufacturing

process, we need a ’good’ sequence alignment be-

tween them, SA(Ω) . By ’good’ we mean an align-

ment that aligns similar treatments made on the dif-

ferent objects on the same column and special tasks

(task performed to correct errors) with gaps (’-’).

Two possible ’good’ alignment for the sequences

shown in Table 1 are proposed in Table 2.

Suppose now that each of the three sequences are

cut when a label is aligned with exactly the same one

in all sequences and this alignment appears two times.

The model of the ﬁrst part of the sequences will be

similar to the one of ﬁgure 1 but without the path

A − E − B at the end of the model. This fact moti-

vates the notion of process phase proposed in this pa-

per. But this notion is not sufﬁcient to solve the cycles

that are introduced with the steps C and D. We con-

sider that this problem is due to the fact that the three

series of events provide no information about the or-

der of the steps C and D. Consequently, any solution

of this problem must take into account some a priori

knowledge about the process which we want to avoid

it. The notion of potential cycle is then deﬁned to de-

tect this kind of situation to be able to make further

investigations (i.e. ﬁnding new series or discussing

with experts for examples).

To illustrate the notion of class of equivalence, let

us take the Edit activity of the ”writing a scientiﬁc pa-

per” process that can be made by different students

and professors. The Edit activities can then be la-

beled differently according to the performer with a set

of classes of the form: C = {C

= {(s

, δ

)},C

{(s

, δ

)}, . . .}, where the variables s

and p

denotes

respectively students and professors. In this case,

the resulting model of the process will be complex

without necessity. One of the interesting features of

the TOM4L Approach is the notion of discrete event

class. This notion can be used to deﬁne abstract

classes of the form C

= {(φ

, δ

), . . . , (φ

, δ

)} where

denotes an abstract variable and the set {δ

}, j =

1. . . n, is an arbitrary set of constants. This property

allows deﬁning classes of equivalence that simpliﬁes

a process model. For example, an abstract class C

be deﬁned as an equivalent class of the set of classes

C of the ”writing a scientiﬁc paper” process. Doing

this way allows constituting a set of sequences com-

ing from different students and professors.

3.2 Class of Equivalence

By deﬁnition, a large scale process consist on a lot

of steps. Some of these steps differs only with some

characteristics but realizes similar treatments.

Deﬁnition 1. Given a model M =

{R(C

, [τ

−

i j

, τ

i j

])} build with a set Ω = {ω

}

of discrete event class occurrences ω

, a class

= {(φ

, δ

), (φ

, δ

), . . . , (φ

, δ

)} is an equiv-

alence class of a sub set of classes C = {C

j = 1. . . n, C

= {(x

, δ

)}, of the set of classes C

a model M iff:

∀C

∈ C

∀C

∈ C

(∃R(C

, [τ

−

, τ

]) ∈ M ∧∃R(C

, [τ

−

, τ

]) ∈ M)

∧ (1)

∀C

∈ C

(∃R(C

, [τ

−

, τ

]) ∈ M ∧∃R(C

, [τ

−

, τ

]) ∈ M)

This deﬁnition means that every classes C

the subset C ⊆ C

are linked with the same classes

in M. When this condition is veriﬁed, each oc-

currences C

) of the classes C

in the sequences

of Ω can be rewritten as occurrences C

)

of the equivalence class C

. The abstract vari-

able φ

has no a priori meaning: φ

can be substi-

tuted with the corresponding concrete variable x

DISCOVERING LARGE SCALE MANUFACTURING PROCESS MODELS FROM TIMED DATA - Application to

STMicroelectronics' Production Processes

229

Table 2: Two Sequence Alignments proposed for sequences on Table 1.

Possible Sequence Alignments





A(t

) B(t

) F(t

) B(t

) E(t

) D(t

) C(t

) − A(t

) E(t

) B(t

)

A(t

) B(t

) − − E(t

) − C(t

) D(t

) A(t

) E(t

) B(t

)

A(t

) B(t

) − − E(t

) − C(t

) D(t

) A(t

) E(t

) B(t

)









A(t

) B(t

) F(t

) B(t

) E(t

) − D(t

) C(t

) A(t

) E(t

) B(t

)

A(t

) B(t

) − − E(t

) C(t

) D(t

) − A(t

) E(t

) B(t

)

A(t

) B(t

) − − E(t

) C(t

) D(t

) − A(t

) E(t

) B(t

)





any occurrences C

). Consequently, the set of

uphill relations {R(C

, [τ

−

i j

, τ

i j

])} of M will be-

come {R(C

, [τ

−

iφ

, τ

iφ

])} and the set of down-

hill relations {R(C

, [τ

−

, τ

])} of M will become

{R(C

, [τ

−

, τ

])}. In practice, an equivalence

class can be used to represent related discrete event

classes. In the application presented in the next sec-

tion, a discrete event class represents a set of treat-

ments that can be made on an object using a particular

machine. Equivalence classes are then used to repre-

sent this set of treatments made by different machines:

in that case, the machines are equivalents because the

same set of treatments can be done on each of the ma-

chines.

Given a set of sequences Ω = {ω

}

i={1,...,n}

, the

algorithm for deﬁning equivalence classes ﬁnd all the

equivalence classes and rewrite the corresponding oc-

currences in each sequence ω

(Algorithm 1):

1. Build a model M given a set of sequences Ω =

{ω

}

i={1,...,n}

2. Find all the subset of classes C verifying the equa-

tion 1

3. For all the subsets C

• Create an equivalence class C

• For all C

∈C, rewrite all the occurrences C

)

in all the sequences ω

∈ Ω with the rewriting

rule: C

) ≡ C

4. Build a new model M

with Ω.

3.3 Process Phase

The information contained in a series of manufactur-

ing messages is concerned with both the state of the

manufactured object and the manufacturing process

that makes evolving this state from an initial state up

to a ﬁnal state. But generally, the state of the manufac-

tured object is not provided with the messages. So we

propose to capture indirectly this dimension with the

notion of Process Phase. However, before introduc-

ing this new concept, we must deﬁne formally what is

called a Sequence Alignment.

3.3.1 Sequence Alignment

A sequence alignment consists in a way of arranging

sequences to identify regions of similarity between

them. Aligned sequences are usually represented

as rows within a matrix M = [m

i, j

]

i∈{1,...,n}, j∈{1,...,r}

Gaps (’-’) are inserted between the residues so that

identical or similar residues are aligned in successive

columns. In our problem, the residues are the discret

event class occurrences.

Deﬁnition 2. Given a model M =

{R(C

, [τ

−

i j

, τ

i j

])} build with a set Ω = {ω

}

of discrete event class occurrences. A sequence

alignment SA between all sequences ω

∈ Ω is

deﬁned as follows:

SA(Ω) = {M

n×r

∀ω

∈ Ω, r ≥ card(ω

) ∧

∀m

i j

∈ M

n×r

, m

i j

∈ C

∪ {

−

} ∧

∀m

i j

, m

i( j+1)

∈ M

n×r

, (2)

i j

= C

),C

) ∈ ω

∧

i( j+1)

= C

),C

) ∈ ω

∧

> t

)}

where:

- card(s) is the length of sequence s.

There are many possible alignments between n

sequences. There exists many algorithms for creat-

ing sequence alignments based on different objectives

and strategies (Thompson et al., 1994; Notredame

et al., 2000; Needleman and Wunsch, 1970; Smith

and Waterman, 1981).

Given a set of sequences Ω = {ω

}

i={1,...,n}

and a

sequence alignment SA(Ω), we propose an algorithm

for renaming classes.This algorithm traverses all se-

quences and renames a class if in the alignment it is

aligned with classes different from itself or with gaps

(Algorithm 2):

• ∀ω

∈ Ω, ∀C

) ∈ ω

do:

– Rename the occurrence C

) when:

- ∃C

) ∈ w

, w

∈ Ω,C

) is aligned with

) in SA(Ω) and C

6= C

; or

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

230

Table 3: Series of event after renaming with Algorithm 2.

A(t

) B(t

) F

1,3

) B

1,4

) E(t

) D

1,6

) C(t

) A(t

) E(t

) B(t

)

A(t

) B(t

) E(t

) C(t

) D

2,15

) A(t

) E(t

) B(t

)

A(t

) B(t

) E(t

) C(t

) D

3,23

) A(t

) E(t

) B(t

)

- C

) is aligned with a gap (’-’) in SA(Ω)

In Table 3 we will show the corresponding series

from Table 1 after applying Algorithm 2 on them,

considering sequence alignment SA

3.3.2 Process Phase Concept

As it was said before, the concept of process phase

will be used to capturate the different stages that an

object must go through in the manufacturing process

to achieve the ﬁnal expected state. For this task we

need to count with a ’good’ sequence alignment of the

sequences that we use to build the model. The reason

is that this alignment provides a guideline to distin-

guish normal treatments from special tasks performed

in particular executions of the process. Aligned tasks

are supposed to be regular tasks whereas not aligned

tasks are not.

Deﬁnition 3. Given a set of sequences

Ω = {ω

}

i={1,...,n}

and a sequence alignment

SA(Ω), a process phase of the model M con-

structed using Algorithm 2 is a submodel

= {R(C

, [τ

−

i j

, τ

i j

])|C

∈ C

} ⊆ M, so

that there is no path P = {R(C

i+1

[τ

−

ii+1

, τ

ii+1

])} ∈ M

, i = 1. . . n, where:

∀ j, k ∈ N, 1 ≤ j < k ≤ n,C

= C

(3)

Algorithm 3 aims at cutting a set Ω =

{ω

}

i={1,...,n}

of sequences in subsequences ω

that

respects the equation 3 (i.e. ω

does not contain two

occurrences of the same class):

1. ∀ω

∈ Ω do

• Remove ω

from Ω.

• Cut up ω

in a set Ω

= {ω

} of sub sequences

verifying the equation 3.

2. ∀ω

∈ Ω

• Add an occurrence of the C

and C

classes at

the beginning and the end of ω

3. Ω =

Ω

An occurrence of an abstract start class C

and

an occurrence of an abstract ﬁnal class C

are

added at the beginning and the end of each sub

sequences ω

so that the BJT 4G algorithm auto-

matically identiﬁes the process phases. For exam-

ple, when applying the algorithm 3 on the three

sequences of Table 1, the BJT 4G algorithm will

ﬁnd two process phases. The second process phase

will be: { R(C

, [τ

−

, τ

]) , R(C

, [τ

−

, τ

]),

R(C

, [τ

−

, τ

]), R(C

, [τ

−

, τ

]) }.

It can be easily deduced from the deﬁnition of pro-

cess phase that process phases are dependent on the

sequence alignment chosen. It is necessary to count

with a ’good’ sequence alignment for being able to

correctly identify the different stages in the process.

3.4 Potential Cycles

When looking the model of ﬁgure 1, it is clear

that the classes C and D introduce a cycle. Cycles

present a strong problem of interpretation, making

hard to understand the resulting models. This explains

why there is a lot of works aiming at avoiding cy-

cles in process models (cf. (Cook and Wolf, 2004),

(Schimm, 2004) (van der Aalst et al., 2004), (Pin-

ter and Golani, 2004), (Weijters and van der Aalst,

2003) or (Agrawal et al., 1998) for examples). But

these works make assumptions about the process or

impose constraints about the constitution of the se-

quences. In all the case, this consists in having some

a priori knowledge about the process to be modeled

or the set of programs that write the messages in the

process data base.

The aim of the TOM4L Approach is to provide

models of sequences without any a priori knowledge

about the process and the set of programs that have

generated the occurrences. One difﬁculty is that cy-

cles often appear when mining a process because of

the transitivity property of the sequential binary rela-

tions.

Property 1. The timed sequential binary relations

R(C

, [τ

−

i j

, τ

i j

]) of a given abstract chronicle model

M = {R(C

, [τ

−

i j

, τ

i j

])} are transitives.

∀R(C

, [τ

−

i j

, τ

i j

]) ∈ M ∧ ∀R(C

, [τ

−

, τ

]) ∈ M

R(C

, [τ

−

i j

, τ

i j

]) ∧ R(C

, [τ

−

j,k

, τ

j,k

])

⇒ ∃R

, [τ

−

, τ

]) (4)

Deﬁnition 4. Given a process model M, two discrete

event classes C

and C

are not ordered when:

M ` R

, [τ

−

i j

, τ

i j

]) ∧ M ` R

, [τ

−

, τ

]) (5)

Two classes C

and C

that can not be ordered in a

model will be denoted C

≡ C

DISCOVERING LARGE SCALE MANUFACTURING PROCESS MODELS FROM TIMED DATA - Application to

STMicroelectronics' Production Processes

231

For examples the three sequences of the Table 1

do not provide any order between the classes C

and

(Figure 1). Consequently: C

≡ C

Property 2. The set of discrete event classes C

, . . . ,C

} can not be ordered when:

∀C

∈ C

, ∀C

∈ C

. (6)

In the theory of graphs, the classes of a set C

are strongly connected components. The algorithm

4 aims at detecting a potential cycle (i.e. a set C

deﬁned):

1. Build a process model M from Ω with the BJT 4G

algorithm.

2. Build the set Ck = {C

} of the sets C

of classes

without order with the equation 5

3. ∀C

∈ Ck do

• Remove the relations R(C

, [τ

−

i j

, τ

i j

]) of M

where C

∈ C

or C

∈ C

• Generate all the pathes P = {P

} with P

{R(C

i+1

, [τ

−

ii+1

, τ

ii+1

])} where C

∈ C

and

i+1

∈ C

• Insert the relations of the pathes P in M

To avoid the adding of a priori knowledge about the

process or the programs, the algorithm 4 computes

all the paths linking the classes in C

(cf. model of

Figure 1 with the classes C and D). It is clear that if

Card(C

) = n, there is n! possible paths. But it is a

simple way to put the emphasis on potential cycles.

3.5 Modeling a Large Scale Process

The algorithm 5 aims at modeling a large scale man-

ufacturing process. It simply uses the four algorithms

provided in the preceding subsections. Given a set

of sequences Ω = {ω

}

i={1,...,n}

and a sequence align-

ment SA(Ω), the algorithm 5 ﬁnds a process model M

with the BJT 4G algorithm:

1. Rewrite the sequences of Ω with the algorithm 1.

2. Rewrite the sequences of Ω with the algorithm 2

using information from SA(Ω).

3. Produce the sets Ω

of subsequences ω

with the

algorithm 3

4. ∀Ω

• Build a process model M

of the phase k with

the algorithm 4.

5. M =

Applied to the sequences of the Table 1, this al-

gorithm provides the model of the Figure 1. This al-

gorithm has also been used to model the wafer manu-

facturing process of the Rousset (France) plant of the

STMicroelectronics company.

4 APPLICATION

The aim of STMicroelectronics Company is to im-

prove the control of the wafer manufacturing pro-

cesses through the deﬁnition of human scale process

models and a better knowledge of the timed con-

straints between the different steps of manufacturing.

A ”wafer” is a silicon plate used for the construc-

tion of STMicroelectronics’ products. A wafer manu-

facturing process is a series of elementary treatments

called ”recipes” that are executed on machines called

”equipments”. An ”operation” is a particular serie

of recipes and each recipe is associated with differ-

ent equipments, those that are qualiﬁed for perfom-

ing the treatment. A complete series of operations is

called ”manufacturing route”. The STMicroelectron-

ics’ plant situated in the south of France, in Rousset,

counts with more than 10.000 different recipes, each

manufacturing route is composed of about 400 oper-

ations and there are actually more than 300 equip-

ments in the plant. The supervision system of the

wafer manufacturing process describes a manufac-

turing route with messages providing the name of a

recipe, the machine on which the recipe is executed,

the corresponding operation and the start and ﬁnish

times of the recipe.

For validating the approach presented in this

paper, a set Ω containing 5 sequences ω

, i =

{1, . . . , 5}, has been extracted from STMicroelectron-

ics’ databases. These sequences contain data from

real production of the Company. For the construc-

tion of this example we considered an extract of each

of these 5 sequences ω

, i = {1, . . . , 5}. Only the 200

ﬁrst executed recipes were used.

The Algorithm 5 has been applied at the equip-

ment level so that a process model M represents a

manufacturing route by a serie of equipments. Even

though we worked with extracts of real routes, we

found a large volume of data. In the example con-

structed, 127 discret event classes have been found.

A class is deﬁned with a singleton C

= {(φ

, i)}

where the constant i is a natural number in the inter-

val [1000, . . . , 1126]. Each of these classes represent

a particular equipment in STMicroelectronics’ plant.

We will illustrate the application of the approach

proposed in this paper showing how it works over two

subsequences of sequences ω

and ω

used for the

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

232

example construction. These two subsequences can

be seen in Figure 2.

Figure 2: Subsequence of ω

(up) and subsequence of ω

(down).

During the example construction, 67 differ-

ent equivalent classes have been found. The

equivalent classes created using Algorithm 1

are singletons of the form C

= {(φ

, j)} with

j ∈ [6000, 6066]. Consider the equivalent class

6008

= {(φ

6008

, 6008)} contains the discret

event classes {C

1030

1031

1032

1033

} and that

1010

1011

1012

} ⊆ C

6009

, {C

1050

1051

1052

} ⊆

6031

, {C

1040

1041

} ⊆ C

6040

, {C

1020

} ⊆ C

6042

1060

1061

} ⊆ C

6046

,{C

1080

1081

} ⊆ C

6056

1070

1071

} ⊆ C

6065

. The algorithm 1 rewrites

the two subsequences of Figure 2 and produces the

subsequences of Figure 3.

Figure 3: Subsequences of ω

and ω

rewritten with Algo-

rithm 1.

A hand made sequence alignment has been used.

The alignment corresponding to the two subse-

quences is shown in Figure 4.

Figure 4: Subsequences of ω

and ω

aligned.

After applying Algorithm 2 for renaming classes

according to the alignment shown in Figure 4, the sub-

sequences of ω

and ω

are rewritten as it is shown in

Figure 5.

Figure 5: Subsequences of ω

and ω

rewritten using Algo-

rithm 2.

Algorithm 3 ﬁnds two process phases (i and i +

1) for the subsequence considered of ω

and for the

subsequence considered of ω

. Process phase i can

be seen in Figure 6 and process phase i + 1 is shown

in Figure 7.

Figure 6: ω

(up) and ω

(down).

Figure 7: ω

i+1

(up) and ω

i+1

(down).

Using Algorithm 4 over the phases found, the cor-

responding models were constructed. Following the

two subsequences used for illustrating the approach,

we construct the models shown in Figures 8 and 9.

Figure 8: Model for phase i obtained from ω

and ω

Figure 9: Model for phase i + 1 obtained from ω

i+1

and

i+1

During the construction of the hole example

model, 19 process phases were found. The resulting

model contains a total of 302 nodes.

In Figure 10 the resulting model constructed for

the 5th phase found is shown. The volume of data in

this small example is large even though we worked

with extracts of manufacturing routes. The model ob-

tained using this method is, however, simple and easy

to understand. The resulting model has already been

validated by STMicroelectronics’ experts.

The next step is to build a model of for com-

plete sequences and for bigger sets (more than 5 se-

quences). Afterwards, it would be valuable to con-

struct models for other levels of granularity (level of

operations or recipe level are some options).

DISCOVERING LARGE SCALE MANUFACTURING PROCESS MODELS FROM TIMED DATA - Application to

STMicroelectronics' Production Processes

233

Figure 10: Phase 5 of the model.

5 CRITICAL

EVALUATION / COMPARISON

The original idea of our method was born due to the

necessity of a general algorithm that could be capa-

ble to reduce the number of nodes when modeling

large scale manufacturing processes. The idea was

then to ﬁnd equivalences between different data. The

result of this idea was the introduction of the equiv-

alent class concept. These equivalent classes would

have different interpretations depending on the level

of granularity considered.

Most of the algorithms in the bibliography are

good for creating models knowing the different ac-

tivities that constitute a process. This would corre-

spond to information in the level of recipes in STMi-

croelectronics’ manufacturing processes. However,

if we consider only information about the machines

used and the time of execution of each recipe as we

did in section 4, the existant algorithms in the process

mining area will model with a high level of redun-

dancy.

For illustrating the problem, we applied the meth-

ods proposed in (Agrawal et al., 1998) and (Cook and

Wolf, 1998) over the subsequences shown in Figure

2. The resulting models are shown in Figure 11. Af-

terwards, we compared the two models obtained with

the model constructed using our approach in section

4 (concatenation of submodels of Figures 8 and 9).

In Figure 11 each node represents a machine that

executes a certain recipe. In fact, for the level of ab-

straction considered in the example, the methods pro-

posed in (Agrawal et al., 1998) and (Cook and Wolf,

1998) tend to construct all possible paths between all

different machines capable of executing the different

recipes in the manufacturing processes. By contrast,

nodes in the model constructed using our approach

are associated to each of the equivalent classes found.

Each equivalent class represent a group of machines

capable of executing a same subset of recipes. The

(a) Model constructed using (Agrawal et al., 1998) approach

(b) Model constructed using (Cook and Wolf, 1998) approach

Figure 11: Models constructed for subsequences of Figure

model constructed using our approach is then smaller

than the ones obtained from other methods for this

level of abstraction. Besides, the resulting model

helps the experts concentrate on the task developed

in each step and not on the machine used for that aim.

Another drawback to use the approach proposed

in (Agrawal et al., 1998) to model STMicroelectron-

ics processes is due to the usage of process graphs.

The edges in this type of models represent control

conditions and the vertices represent activities. Any

activity appears only once in a process graph as a ver-

tex label. STMicroelectronics’ manufacturing routes

deﬁnitions involve the repetition of several tasks.

However, these repetitions are controlled and the re-

execution of tasks is only possible for certain states

of the manufactured objects and not for others. If we

try to use process graphs for modeling these manu-

facturing processes, the identiﬁcation of states of the

manufactured objects result difﬁcult if not impossible.

6 CONCLUSIONS

This paper presents the TOM4L Approach for mod-

eling manufacturing processes from timed data con-

tained in a supervision system database. One of the

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

234

interesting features of this approach is the notion of

discrete event class. This notion is used to deﬁne a

process phase concept and discrete event classes of

equivalence that are required for large scale manu-

facturing processes. The deﬁnition of these concepts

leads to a global algorithm that has been applied to

modeling STMicroelectronics’ (Rousset) manufactur-

ing processes. This concrete application shows the

operational ﬂavor of the extensions of the TOM4L

Approach Framework. The construction of these

models will be a valuable tool for STMicroelectron-

ics to control production and to alarm experts when

the real activity of the Company does not follow their

theoretical deﬁnitions.

7 CURRENT WORKS

Current works are devoted to ﬁnding an adaptation of

sequences alignment algorithms used in the genetic

ﬁeld (Notredame et al., 2000). The objective is to ﬁnd

an analogy of the work presented in (Gauthier et al.,

2008), works done on the social science ﬁeld. Our

idea is to ﬁnd a way to calculate the similarity val-

ues between different events classes in any kind of se-

quence, without introducing experts knowledge about

their contents.

REFERENCES

Agrawal, R., Gunopulos, D., and Leymann, F. (1998). Min-

ing process models from workﬂow logs. In Sixth In-

ternational Conference on Extending Database Tech-

nology, pages 469–483.

Benayadi, N., Le Goc, M., and Bouch

e, P. (2008). Using the

stochastic approach framework to model large scale

manufacturing processes. Proceedings of the 3rd In-

ternational Conference on Software and Data Tech-

nologies (ICSoft 2008).

Cook, E. J. and Wolf, A. L. (1998). Discovering models

of software processes from event-based data. ACM

Transactions on Software Engineering and Methodol-

ogy, 7:215–249.

Cook, E. J. and Wolf, A. L. (2004). Event-based detec-

tion of concurrency. In Proceedings of the 6th ACM

SIGSOFT international symposium on Foundations of

software engineering, volume 53, pages 35–45.

Gauthier, J. A., Widmer, E. D., Bucher, P., and Notredame,

C. (2008). How much does it cost? optimization of

costs in sequence analysis of social science data. So-

ciological Methods and Research.

Ghallab, M. (1996). On chronicles: Representation, on-line

recognition and learning. Proc. Principles of Knowl-

edge Representation and Reasoning, Aiello, Doyle

and Shapiro (Eds.) Morgan-Kauffman, pages 597–

606.

Le Goc, M. (2006). Notion d’observation pour le diagnostic

des processus dynamiques: Application

a Sachem et

a la d

ecouverte de connaissances temporelles. Hdr,

Facult

e des Sciences et Techniques de Saint J

ome.

Le Goc, M., Bouch

e, P., and Giambiasi, N. (2005). Stochas-

tic modeling of continuous time discrete event se-

quence for diagnosis. 16th International Workshop on

Principles of Diagnosis (DX’05) , California, USA.

Needleman, S. B. and Wunsch, C. D. (1970). A general

method applicable to the search of similarities in the

amino acid sequence of two proteins. J. Mol. Biol.,

48:443–453.

Notredame, C., Higgins, D. G., and Heringa, J. (2000). T-

coffee: A novel method for fast and accurate multiple

sequence alignment. J. Mol. Biol., 302:205–217.

Pinter, S. and Golani, M. (2004). Discovering workﬂow

models from activities’ lifespans. In Special issue:

Process/workﬂow mining, volume 53, pages 283–296.

Schimm, G. (2004). Mining exact models of concurrent

workﬂows. In Computers in Industry, volume 53(3),

pages 265–281.

Smith, T. F. and Waterman, M. S. (1981). Identiﬁcation

of common molecular subsequences. J. Mol. Biol.,

147:195–197.

Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994).

Clustal w: improving the sensitivity of progres-

sive multiple sequence alignment through sequence

weighting, position-speciﬁc gap penalties and weight

matrix choice. Nucleic Acid Research, 22:4673–4680.

van der Aalst, W., Weijters, T., and Maruster, L. (2004).

Workﬂow mining: Discovering process models from

event logs. In IEEE Transactions on Knowledge and

Data Engineering, volume 16, pages 1128–1142.

van der Aalst, W. M. P. and Weijters, A. J. M. M. (2004).

Process mining. Special issue of Computers in Indus-

try, 53:231–244.

Weijters, A. and van der Aalst, W. (2003). Rediscovering

workﬂow models from event-based data using little

thumb. In Integrated Computer-Aided Engineering,

volume 10(2), pages 151–162.

DISCOVERING LARGE SCALE MANUFACTURING PROCESS MODELS FROM TIMED DATA - Application to

STMicroelectronics' Production Processes

235