EFFICIENT LEARNING OF DYNAMIC BAYESIAN NETWORKS

FROM TIMED DATA

Ahmad Ahdab and Marc Le Goc

LSIS, UMR CNRS 6168, Université Paul Cézanne, Domaine Universitaire St Jérôme, 13397 Marseille cedex 20, France

Keywords: Machine Learning, Bayesian network, Stochastic representation, Data mining, Knowledge discovery.

Abstract: This paper addresses the problem of learning a Dynamic Bayesian network from timed data without prior

knowledge to the system. One of the main problems of learning a Dynamic Bayesian network is building

and orienting the edges of the network avoiding loops. The problem is more difficult when data are timed.

This paper proposes an algorithm based on an adequate representation of a set of sequences of timed data

and uses an information based measure of the relations between two edges. This algorithm is a part of the

Timed Observation Mining for Learning (TOM4L) process that is based on the Theory of the Timed

Observations. The paper illustrates the algorithm with an application on the Apache system of the

Arcelor-Mittal Steel Group, a real world knowledge based system that diagnoses a galvanization bath.

1 INTRODUCTION

This paper describes the BJM4BN algorithm (BJ-

Measure for Bayesian Networks) that learns a

Dynamic Bayesian networks from timed data

without prior knowledge to the process that

generates the timed data. Most of the contributions

to learn a bayesian network are based on un-timed

data. The main difficulties are orienting the edges of

the resulting graph and building the conditional

probability tables. These problems are more difficult

when data are timed.

The BJM4BN algorithm proposes an efficient

solution to solve these problems when data are

timed. The solution is based on a particular

representation of the timed data called the Stochastic

Representation. These representation is the basis of

the theory of Timed Observations (Le Goc, 2006).

This theory defines a learning process called Timed

Observation Mining for Learning (TOM4L) (Le

Goc, 2005). The TOM4L process aims at

discovering temporal knowledge about a set of time

functions xi(t) considered as a dynamic system

X(t)={x

(t)} called a process. To this aim, the theory

of Timed Observations defines an entropic measures

called the the BJ-Measure of (Benayadi, 2008) that

evaluates the flow of information between two nodes

in a graph and provides so an efficient mean to

orient the edges.

The next section presents a very short state of the

art about learning Dynamic Bayesian networks

(DBN). Section 3 introduces the basis of the

Stochastic Representation of TOM4L and the BJ-

Measure. Section 4 describes the BJM4BN

algorithm and section 5 shows a real life application

of the algorithm. Section 6 concludes the paper.

2 RELATED WORKS

A BN is a couple <G, θ> where G denotes a Direct

Acyclic Graph in which the nodes represent the

variables and the edges represent the dependencies

between the variables (Pearl, 1988), and θ is the

Conditional Probabilities Tables (CP Tables)

defining the conditional probability between the

values of a variable given the values of the upstream

variables of G. BN learning algorithms aims at

discovering the couple <G, θ> from a given data

base. BN learning algorithms fall into two main

categories: “search and scoring” and “dependency

analysis” algorithms. The “search and scoring”

learning algorithms can be used when the knowledge

of the edge orientation between the variables of the

system is given (Cooper, 1992), (Heckerman, 1997).

To avoid this problem, dependency analysis

algorithms uses conditional independence tests

(Cheng, 1997), (Cheesseman, 1995), (Friedman

1998), (Meyrs et al, 1999). But the number of test

exponentially increases the computation time

226

Ahdab A. and Le Goc M. (2010).

EFFICIENT LEARNING OF DYNAMIC BAYESIAN NETWORKS FROM TIMED DATA.

In Proceedings of the 12th International Conference on Enterprise Information Systems - Artiﬁcial Intelligence and Decision Support Systems, pages

226-231

DOI: 10.5220/0002897802260231

 SciTePress

(Chickering, 1994).

For example, Cheng’s algorithm (Cheng, 1997)

for learning BN is based on the d-separation concept

of (Pearl, 1988) to infer the structure G of the

Bayesian Network, and the mutual information I(X,

Y) (eq. 1) to detect conditional independence

relations.

()

(

)

()

yYPxXP

yYxXP

yYxXPYXI

===

∑

log,),(

(1)

The mutual information I(X, Y) is used to

evaluate all the potential pairs of variables (X, Y) and

to producing a list L sorted in descending order:

pairs of higher mutual information are supposed to

be more related than those having low mutual

information values. The List L is then pruned given

an arbitrary parameter

: each pair (X, Y) so that

I(X, Y)<

is eliminated of L. In real world

applications, list L should be as small as possible

using the

parameter. The two main limitations such

approaches is the need of defining the

parameter

and the exponential amount of Conditional

Independence tests to orient the edges of the graph.

3 TOM4L FRAMEWORK

The BJM4BN algorithm provides a solution to these

two problems that is based on the Stochasxtic

Representation of the TOM4L framework (Le Goc,

2005), (Le Goc, 2006), (Le Goc, 2008).

In this framework, a message timed at t

contained in a database is an occurrence C

(k) of an

observation class C

={(x

)} which is an arbitrary

set of couples (x

) where

is one of the discrete

value of a variable x

. An observation class is often a

singleton because in that case, two classes C

= {(x

)} and C

= {(x

)} are only linked with the

variables x

and x

when the constants

and

are

dependent (Le Goc, 2006). in

The Stochastic Representation of a sequence

ω=(…, C

(k), …) of m occurrences C

(k) defining a

set Cl={ C

} of n timed observations is a set of

matrix (Le Goc, 2005), (Bouché, 2005) from which

the BJ-Measure is computed (Benayadi, 2008). This

measure is based on the Kullback-Leibler distance

D(P(Y|X=C

)||P(Y)) that evaluates the relation

between the distribution of the conditional

probability of Y knowing that X = C

and the prior

probability distribution of Y. The BJ-measure

decomposes the Kullback-Leibler distance in two

terms around the independence point

P(Y|X=C

)=P(Y) (i.e. D(P(Y|X=C

)||P(Y)) = 0).

The BJL-measure BJL(C

→C

) of binary relation

r(C

→C

) is the right part of the Kullback-Leibler

distance D(P(Y|X=C

)||P(Y)) so that:

• P(Y=C

|X=C

)<P(Y=C

) ⇒ BJL(C

, C

)=0

• P(Y=C

|X=C

)≥P(Y=C

) ⇒

BJL(C

→C

)= D(P(Y|X=C

)||P(Y))

The BJL(C

→C

) is not null when the observation

(k) provides some information about the

observation C

(k). Symmetrically, when

BJL(C

→C

)<0, the observation C

(k) provides some

information about ¬C

(k). The BJL-measure

BJL(C

→¬C

) of a binary relation r(C

→¬C

) is then

the left part of the Kullback-Leibler distance:

• P(Y=C

|X=C

)<P(Y=C

) ⇒

BJL(C

→¬C

)=D(P(Y|X=C

)||P(Y))

• P(Y=C

|X=C

)≥P(Y=C

) ⇒ BJL(C

→¬C

)= 0

Consequently:

D(P(Y|X=C

)||P(Y))=BJL(C

→C

)+BJL(C

→¬C

)

(2)

Similarly, the BJW-measure evaluates the

information distribution between the predecessors

(k) or ¬C

(k)) of an observation C

(k+1) at time

k+1

D(P(X|Y=C

)||P(X))=BJW(C

→C

)+BJW(C

→¬C

)

(3)

Because (P(C

)<P(C

))⇔P(C

)<P(C

)), the

two measures are null at the same independence

point and can be combined in a single measure

called the BJM-measure which is the norm vector of

BJL(C

→C

) and BJW(C

→C

M(C

→C

),(),(

jiji

CCBJWCCBJL +

–

),(),(

jiji

CCBJWCCBJL ¬+¬

(4)

The BJ-Measure is no more justifiable when the

i,j

= n

is greater of 4 or less than ¼ (Benayadi,

2008). This property is called the

property.

When a BJ-measure M(C

→C

) is positive, the

timed observation distribution of the C

class bring

information about the timed observation distribution

of the C

class. So, considering the positive values

only, the BJ-measure M(C

→C

) satisfies the three

following properties:

1. Dissymmetry:

M(C

→C

)≠M(C

→C

) (generally)

2. Positivity: ∀ C

, C

, M(C

→C

) ≥ 0

3. Independence:

M(C

→C

)=0 ⇔ C

and C

are independant (i.e.

P(C

)=P(C

))

4. Triangular inequality:

M(C

→C

) < M(C

→C

) + M(C

→C

)

This latter property can be used to reason with

the BJ-measure to deduce the structure of a dynamic

Bayesian network.

EFFICIENT LEARNING OF DYNAMIC BAYESIAN NETWORKS FROM TIMED DATA

227

4 LEARNING PRINCIPLES

Let us consider a set R={…, r(C

→C

), …} of n

binary relations. The operation that remove a binary

relation r(C

→C

) from the set R is denoted

Remove(r(C

→C

)): R ← R – { r(C

→C

) }.

The positivity property leads to remove the

r(C

→C

) relations having a negative value of the

BJ-measure:

• Rule 1 (“Positivity rule”):

∀r(C

→C

)∈R, M(C

→C

)≤0 ⇒

Remove(r(C

→C

))

The dissymmetry property allows deducing the

orientation of a hypothetical relation between two

timed observation classes C

and C

• Rule 2 (Orientation rule):

∀r(C

→C

), r(C

→C

)∈R,

M(C

→C

)>M(C

→C

) ⇒ Remove(r(C

→C

))

i+1

BJM(C

→C

i+1

)

BJM(C

i+1

→C

i+2

)

BJM(C

→C

)

i+2

i+n

BJM(C

i+n

→C

)

…

i+1

BJM(C

→C

i+1

)

BJM(C

i+1

→C

i+2

)

BJM(C

→C

)

i+2

i+n

BJM(C

i+n

→C

)

…

Figure 1: Loops.

Now, let us consider a set R={r(C

→C

i+1

r(C

i+1

→C

i+2

), ..., r(C

i+n

→C

), r(C

→C

)} of n+2

binary relations defining a loop (Figure 2) where:

• ∀r(C

→C

) ∈ R, M(C

→C

) > 0

The problem of the set R is that computing the

distribution of a class C

requires knowing its

distribution: loops must then be avoided. In other

words, a relation r(C

→C

) must be removed from R

to break the loop. To solve this problem, the idea is

to used the monotonous property of the BJ-measure:

finding two of class C

and C

so that the BJ-measure

of the relation r(C

→C

) is the lowest of the loop:

• Rule 3 (Loop Rule):

∀r(C

→C

)∈R, ∃r(C

→C

)∈R, x≠i, y≠j,

M(C

→C

)>M(C

→C

) ⇒ Remove(r(C

→C

))

When M(C

→C

)=M(C

→C

)), any of the

relations can be removed. The extreme case of loop

is can be find in a set R containing a reflexive

relation r(C

→C

) where M(C

→C

)>0. Rule 3 must

then be adapted to this extreme (but frequent) case:

• Rule 4 (Reflexivity rule):

∀r(C

→C

) ∈ R, M(C

→C

) > 0 ⇒

Remove( r(C

→C

) )

Finally, to build naïve Bayesian Networks, the

algorithm must avoid the multiple paths leading to a

same C

class (Figure 3). To avoid this problem, the

idea is to use the monotonous property of the BJ-

measure: finding two of class C

and C

so that the

BJ-measure of the relation r(C

→C

) is the lowest of

the paths. To use this idea, all the paths leading to a

particular C

class must be find in R. Let us suppose

that R contains n paths R

⊆R, R

⊆R, …, R

⊆R

leading to the C

class (i.e. each R

is of the form

={r(C

→C

k-n

), r(C

k-n

→C

k-n+1

), ..., r(C

→C

r(C

→C

l-n

), r(C

l-n

→C

l-n+1

), r(C

→C

)}. The algorithm

must find the r(C

→C

) relation with the lowest BJ-

measure to remove it in R (“Transitivity rule”):

• Rule 5 (Transitivity rule):

∀r(C

→C

)∈R

∪R

∪…∪R

∃r(C

→C

) ∈ R

∪R

∪…∪R

, x≠i, y≠j,

M(C

→C

)>M(C

→C

)⇒Remove(r(C

→C

))

k-n

k-n-1

…

l-m

l-m-1

…

k-n

k-n-1

…

l-m

l-m-1

…

Figure 2: Multiple Paths.

These five rules are necessary (but no sufficient)

to design an algorithm, but its efficiency depends

mainly of the number of relation in the initial set R.

The TOM4L framework provides the mathematical

tools to remove the relations that can not play a

significant role in the building of a naïve Bayesian

Network. Given the set R={…, r(C

→C

), …} of n

binary relations that can be build from a sequence

of timed observation C

(k) defining a set C={C

} of

N(C) classes C

. The size of the Stochastic

Representation matrix of the TOM4L framework is

then N(C)

⋅

N(C)=N(C)

. This provides two ways to

eliminate a relation r(C

→C

) having no interest for

building a naïve Bayesian Network:

• Test 1: P(C

)⋅P(C

, C

)≤1/N(C)

⇒ Remove(r(C

→C

))

This first test compares b

≡P(C

)⋅P(C

, C

)

with the hazard of having an occurrence C

(k) of the

class at time t

knowing that there is an

occurrence C

(k+1) of the C

class occurring at time

k+1

. Because

defines N(C) classes, the a priori

probability of having an occurrence C

(k) of a C

class followed by an occurrence C

(k+1) of the C

class is P(C

)=1/N(C) and the probability of

reading a couple (C

(k), C

(k+1)) in

is P(C

, C

1 / N(C)

. So each b

value can be compared with

the “absolute” hazard 1 / (N(C) N(C)

• Test 2: P(C

)⋅P(C

, C

)≤(1/(N(C)⋅P(C

)⋅P(C

))

⇒ Remove(r(C

→C

))

This second test defines the hazard when

supposing that C

and C

classes are independent. In

that case, the probability P((C

(k), C

(k+1)) in

having a couple (C

(k), C

(k+1)) in

is P(C

)⋅P(C

)

and having an occurrence C

(k) of a C

class, the

hazard is to read any occurrence C

(k+1) of the C

ICEIS 2010 - 12th International Conference on Enterprise Information Systems

228

class: P(C

)=1/N(C). So each b

value can be

compared with the “relative” hazard

1/(N(C)⋅P(C

)⋅P(C

)).

The

property of the BJ-measure complete these

two tests to eliminate the relation having no meaning

according to the BJ-measure:

• Test 3:

i,j

>4 ∨

i,j

<1/4 ⇒ Remove(r(C

→C

))

Within the TOM4L framework, these tree tests

are implemented in the F0/1=[f

] matrix:

• (b

>1/N(C)

) ∧ (b

>(1/(N(C)⋅P(C

)⋅P(C

)) ∧ (1/4

≤

i,j

≤ 4) ⇔ f

= 1

So this lead to the rule number 6:

• Rule 6: ∀r(C

→C

)∈R,

= 0 ⇒ Remove( r(C

→C

) )

These six rules are used by the BJM4BN

algorithm to build a naïve Bayesian Network.

5 THE BJM4BN ALGORITHM

The BJM4BN algorithm takes as inputs a sequence

ω of m timed observation C

(k) defining a set

Cl={C

} of N(Cl) classes C

and an output C

class

that is the class for which the DBN is computed. It

produces a set G={…, r(C

→C

), …} of n binary

relations that form the structure of a naïve Bayesian

Network (G,

The “BJM4BN” algorithm contains 5 stages. The

first stage computes the Stochastic Representation of

to produce the initial M=[m

] matrix containing

the BJ-measure values of the N(Cl)

binary relations

r(C

→C

)) defined by ・ (line 1).

// Stage 1

1. Compute the M=[m

] matrix

2. ∀i=0…N(Cl), ∀j=0…N(Cl), f

3. ∀i=0…N(Cl), ∀j=0…N(Cl),

>1/N(C)

)∧(b

>(1/(N(C)⋅P(C

)⋅P(C

))

∧(1/4≤

i,j

≤ 4) ⇒ f

4. M=M⋅F0/1

5. ∀i=0…N(Cl), ∀j=0…N(Cl),

5.1. m

≤0 ⇒ m

=0 // rule 1

5.2. i=j ⇒ m

=0 // rule 4

Next, the F0/1=[f

] matrix is computed using

test 4 (line 3) so that the new values m

⋅f

matrix M=[m

] with rule 6 (line 4). Finally, the M

matrix is normalized using rules 1 (line 5.1) and 4

(line 5.2).

// Stage 2

6. L={

}

7. ∀i=0…N(Cl), ∀j=0…N(Cl), m

>0,⇒

L=L+{(r(C

→C

), m

)}

Stage 2 computes the list L from the normalized

matrix M. The list L={(r(C

→C

), m

)} contains

couples of the form (r(C

→C

), m

) where m

the BJ-measure value of the associated r(C

→C

)

binary relation. This list will be used to find the

relation r(C

→C

) so that m

is the minimal value

of the BJ-measures contained in L.

// Stage 3

8. C

, G={

}

9. ∀r(C

→C

))∈L ⇒ G=G+{r(C

→C

)}

10. Build(G, C

){

∀r(C

→C

))∈G, ∀r(C

→C

))∈L,

G=G+{r(C

→C

)}

Build(G, C

)

}// End Build Function

Stage 3 builds recursively the initial G graph

from the C

class. This stage uses a recursive

function called “Build(G, C

)” where C

is the class

the graph of which is to build.

// Stage 4

11. R={

}

12. ∀R

⊆G, R

≡{r(C

→C

i+1

), r(C

i+1

→C

i+2

), ...,

r(C

i+n

→C

), r(C

→C

)} ⇒ R=R+{R

}

13. ∀R

∈R, ∀r(C

→C

)∈R

, r(C

→C

)∉L

⇒

+{(r(C

→C

), m

)}

14. While R≠{

} repeat

. ∃r(C

→C

)∈L

, m

= Min(m

, L

)

∀R

∈R, ∃r(C

→C

)∈R

⇒

R=R-{R

}

G=G–{r(C

→C

)}

-{(r(C

→C

), m

)}

Stage 4 finds and removes the loops in G with

Rule 3. This stage finds all the loops R

in G of the

form R

≡{r(C

→C

i+1

), r(C

i+1

→C

i+2

), ..., r(C

i+n

→C

r(C

→C

)} and put them in a set R (line 12). Next, a

new list L

is build containing all the relation

r(C

→C

) in R with its associated m

BJ-measure

value (line 13). Every loops R

in R are then removed

using Rule 3 and the resulting G graph is updated

(line 14). At the end of this stage, G contains no

more loops. It is to note that the L

list being global

(i.e. containing all the relations r(C

→C

)

participating in a loop), it is guaranty that the set of

removed relation r(C

→C

) is optimal: it is minimal

and the removed relations are the smallest of the G

graph.

//Stage 5

15. R={

}

16. ∀R

⊆G,

≡{r(C

→C

k-n

), r(C

k-n

→C

k-n+1

), ...,

r(C

→C

), r(C

→C

l-n

), r(C

l-n

→C

l-n+1

), ...,

r(C

→C

)} ⇒ R=R+{R

}

17. ∀R

∈R,

∀r(C

→C

)∈R

, r(C

→C

)∉L

⇒

+{(r(C

→C

), m

)}

18. While R≠{

} repeat

∃r(C

→C

)∈L

, m

= Min(m

, L

)

∀R

∈R, ∃r(C

→C

)∈R

⇒

R=R-{R

}

G=G–{r(C

→C

)}

-{(r(C

→C

), m

)}

EFFICIENT LEARNING OF DYNAMIC BAYESIAN NETWORKS FROM TIMED DATA

229

Stage 5 removes the multiple paths in the G

graph with Rule 5. This stage proceeds as the stage

4, but the R set contains only paths R

of the form

≡{r(C

→C

k-n

), r(C

k-n

→C

k-n+1

), ..., r(C

→C

r(C

→C

l-n

), r(C

l-n

→C

l-n+1

), ..., r(C

→C

)} (line 16).

At the end of this stage, G contains no more multiple

paths and it is guaranty that the set of removed

relation r(C

→C

) is optimal.

Stage 6 computes the conditional probabilities

tables for G and finalizes the algorithm. The

computing of the Conditional Probabilities Tables

(CP Tables) is based on the numbering table N=[n

]

of the Stochastic Representation of the ω sequence:

• P(Y=C

| X=C

) + P(Y=¬C

| X=C

) = 1.

The computing of the CP tables uses this

property (for simplicity, P(C

) is rewritten P(y|x)).

For a root node C

• P(x)=(Σ

) / Σ

For a single relation r(C

→C

• P(y|x) = n

/ (Σ

)

• P(y|¬x) = (Σ

)-n

) / (Σ

–(Σ

))

For a set R={r(C

→

), r(C

→C

)} of two

relations converging to the same C

class:

• P(y|x,z) = (n

) / (Σ

+Σ

)

• P(y|¬x,z) = (Σ

-n

) / (Σ

-Σ

)

• P(y|x,¬z) = (Σ

-n

) / (Σ

-Σ

)

• P(y |¬x,¬z) = (Σ

-n

) / (Σ

-Σ

)

The next section illustrates the computation of

the CP tables with a real world process.

6 REAL WORLD APPLICATION

The Apache system is a clone Sachem, the

knowledge based systems that The Arcelor Group,

one of the most important steal companies in the

world, has developed to monitor and diagnose its

production tools (Le Goc, 2004). Apache aims at

controlling a zinc bath, a hot bath containing a liquid

mixture of aluminum and zinc continuously fed with

aluminum and zinc ingots in which a hot steel strip

is immerged. Apache monitors and diagnoses around

11 variables and is able to detect around 24 types of

alarms. The analyzed sequence ω contains 687

events of 13 classes for 11 discrete variables. The

counting matrix N contains then 156 cells n

(Table

1). The corresponding M matrix is provided in Table

2, the F0/1 matrix in Table 3 and the normalized M

matrix in Table 4. These matrixes are computed in

the first stage of the “BJM4BN” algorithm.

The stage 2 computes the L list of Table 5. The

node of interest being 1006, the initial G graph

resulting of stage 3 of the “BJM4BN” algorithm is

given in Figure 4. This stage uses the normalized M

matrix and start with the 1006 column to add the

relations r(1001, 1006) and r(1001, 1006) in the

initial G. graph. Next, the Build(G, 1006) function is

executed to add the relation r(1004, 1001) in G

before calling Build(G, 1001) function and so on.

Table 1: Counting matrix N of ω.

N 1001 1002 1004 1006 1014 1020 1022 1024 1025 1026 1029 1031 1037 TOTAL

1001

620150100604503

1002

411410000002402

1004

24020200301311

1006

10 5 7 35 1 15 1 6 20 4 21 23 0 148

1014

01040000140001

1020

1221805016251005

1022

0001000100200

1024

10230300216702

1025

2432207262633412112

1026

00080401200201

1029

4121507033931413010

1031

9622106171911136212

1037

0000010000210

39 26 19 148 11 51 4 25 124 18 102 116 4 68

Table 2: The M matrix of ω.

M 1001 1002 1004 1006 1014 1020 1022 1024 102

1026 1029 1031 1037

1001

0.0731 0.0104 -0.6851 0.0193 -0.7048 -0.1189 -0.7478 -0.6877 -0.0023 -0.6860 -0.0138 -0.0073 -0.7478

1002

0.0901 0.0000 0.0136 -0.0137 0.4975 -0.8044 -0.5332 -0.5741 -1.4754 -0.5351 -0.0572 -0.0010 -0.5332

100

0.0540 0.2722 -0.4630 -0.0678 -0.4035 0.0203 -0.3894 -0.5242 -0.0008 -0.4539 -0.1494 0.0000 0.7084

1006

0.0022 -0.0014 0.0279 0.0002 -0.1357 0.0053 0.0039 0.0012 -0.0032 0.0001 -0.0001 -0.0002 -2.0168

101

-0.7094 0.1322 -0.4136 0.0476 -0.3017 -0.8745 -0.2377 -0.5036 -0.0625 0.5957 -1.4682 -1.6083 -0.2377

1020

-0.1232 0.0000 0.0135 0.0107 -0.8802 0.0054 -0.9419 -0.0496 -0.0170 0.0185 -0.0164 0.0011 -0.9419

1022

-0.7478 -0.5332 -0.4079 0.0039 -0.2569 -0.9276 -0.1304 0.5404 -1.7875 -0.3894 0.2021 -1.7067 -0.1304

102

-0.0146 -0.5741 0.1290 -0.0457 -0.5041 0.0244 -0.5157 -0.5667 -0.0925 0.0226 0.0211 0.0225 -0.5157

1025

-0.1857 -0.0024 -0.0018 -0.0011 -1.6506 -0.0053 0.1506 0.0088 0.0008 -0.0005 0.0095 -0.015

0.0204

102

-0.6874 -0.5312 -0.4549 0.0557 -0.3895 0.1293 -0.3708 0.0294 -0.0276 -0.4451 -1.3792 -0.0194 -0.3708

1029

-0.0131 -0.2437 -0.0158 -0.0065 -1.4442 -0.0004 -1.5483 -0.0050 0.0124 0.0020 -0.0002 -0.0037 -1.5483

1031

0.0053 0.0070 -0.0390 -0.0017 -1.6411 -0.0138 0.0214 0.0191 -0.0007 -0.2127 -0.0132 0.0073 0.1528

1037

-0.7478 -0.5332 -0.4079 -2.0168 -0.2569 0.2355 -0.1304 -0.5157 -1.7875 -0.3894 0.2021 0.0268 -0.1304

Table 3: The F0/1 matrix of ω.

F0/1

1001 1002 1004 1006 1014 1020 1022 1024 1025 1026 1029 1031 1037

1001 1 1

1002 1 1

1004 1 1 1

1006 1 1 1

1014 1 1

1020 1 1 1

1022

1024 1 1

1025 11

1026 1 1

1029 1

1031 1 1

1037

Table 4: The normalized M matrix.

Normed M

1001 1002 1004 1006 101

1020 1022 1024 1025 1026 1029 1031 1037

1001 0.0193

1002 0.0901 0.4975

1004 0.0540 0.2722 0.0203

1006 0.0022 0.0053

1014 0.1322 0.5957

1020 0.0107 0.0011

1022

1024 0.1290 0.0244

1025 0.0095

1026 0.1293 0.0294

1029 0.0124

1031 0.0053

1037

Table 5: The L list of Stage 2.

M(i, j )

1014 1026 0.5957

1002 1014 0.4975

1004 1002 0.2722

1024 1004 0.1290

1002 1001 0.0901

1004 1001 0.0540

1026 1024 0.0294

1024 1020 0.0244

1004 1020 0.0203

1001 1006 0.0193

1020 1006 0.0107

1031 1001 0.0053

1020 1031 0.0011

r (i, j )

The G graph of Figure 3 having no loops, the

stage 4 modify noting and stage 5 is executed with

ICEIS 2010 - 12th International Conference on Enterprise Information Systems

230

REFERENCES

this graph. Because each relation of the G graph

participates to the seven paths leading to the nodes

1006 and 1001, the L

list is equal to the L list of

Table 5. This table has been sorted so that the

minimal value m

is at the end of the list. Its is then

easy to see that the relation r(1020, 1031) is the first

removed relation, before the relations r(1020, 1006)

and r(1004, 1001). The elimination of these three

relations is sufficient to build the final G graph of

Figure 4.

Benayadi, N., Le Goc, M., (2008). Discovering Temporal

Knowledge from a Crisscross of Timed Observations.

To appear in the proceedings of the 18th European

Conference on Artificial Intelligence (ECAI'08),

University of Patras, Patras, Greece.

Bouché, P., Le Goc, M., Giambiasi, N., (2005). Modeling

discrete event sequences for discovering diagnosis

signatures. Proceedings of the Summer Computer

Simulation Conference (SCSC05) Philadelphia, USA.

Cheeseman, P., Stutz, J., (1995). Bayesian classification

(Auto-Class): Theory and results. Advances in

Knowledge Discovery and Data Mining, AAAI Press,

Menlo Park, CA, p. 153-180.

1006

1001

1020

1031

10261004 10141002 1024

1006

1001

1020

1031

10261004 10141002 1024

Cheng, J., Bell, D., Liu, W., (1997). Learning Bayesian

Networks from Data An Efficient Approach Based on

Information Theory.

Figure 3: Initial G graph.

Cheng, J., Greiner, R., Kelly, J., Bell, D., Liu, W., (2002).

Learning Bayesian Networks from Data: An

Information-Theory Based Approach. Artificial

Intelligence, 137, 43-90.

P(1031) = 0,176

P(1004) = 0,026

P(1014 | 1002) = 0.385

P(1014 | ¬1002) = 0.024

P(1026 | 1014) = 0.400

P(1026 | ¬1014) = 0.010

P(1024 | 1026) = 0.059

P(1024 | ¬1026) = 0.025

P(1001 | 1031, 1002) = 0.088

P(1020 | 1031, ¬1002) = 0.053

P(1020 | ¬ 1031, 1002) = 0.053

P(1020 | ¬1031, ¬1002) = 0.048

P(1020 | 1024) = 0.120

P(1020 | ¬1024) = 0.033

P(1006 | 1001) = 0.385

P(1006 | ¬1001) = 0.037

1006

1001

1020

1031

10261004 10141002 1024

P(1031) = 0,176

P(1004) = 0,026

P(1014 | 1002) = 0.385

P(1014 | ¬1002) = 0.024

P(1026 | 1014) = 0.400

P(1026 | ¬1014) = 0.010

P(1024 | 1026) = 0.059

P(1024 | ¬1026) = 0.025

P(1001 | 1031, 1002) = 0.088

P(1020 | 1031, ¬1002) = 0.053

P(1020 | ¬ 1031, 1002) = 0.053

P(1020 | ¬1031, ¬1002) = 0.048

P(1020 | 1024) = 0.120

P(1020 | ¬1024) = 0.033

P(1006 | 1001) = 0.385

P(1006 | ¬1001) = 0.037

1006

1001

1020

1031

10261004 10141002 1024

1006

1001

1020

1031

10261004 10141002 1024

Chickering, D. M., Geiger, D., Heckerman, D., (1994).

Learning Bayesian Networks is NP-Hard. Technical

Report MSR-TR-94-17, Microsoft Research, Microsoft

Corporation.

Cooper, G. F., Herskovits, E., (1992). A Bayesian Method

for the induction of probabilistic networks from data.

Machine Learning, 9, 309-347.

Figure 4: Final G graph.

The CPT tables are computed using the N matrix

(table 1). This matrix provides all the information’s

to compute the probabilities of the root nodes of

figure 4. A handmade graph of the experts of the

Arcelor Group in 2003 can be find in (Bouché,

2005) and (Le Goc, 2005). The G graph is all

contained in the expert’s graph. But the expert’s

graph does not contain the 1024 class: corresponding

to an operator query for a chemical analysis, this

class has been removed by experts.

Friedman, N., (1998). The Bayesian structural EM

algorithm. Proceedings of the 14th Conference on

Uncertainty in Artificial Intelligence. Morgan

Kaufmann, San Francisco, CA, p. 129-138.

Heckerman, D., Geiger, D., Chickering, D. M., (1997).

Learning Bayesian networks: the combination of

knowledge and statistical data. Machine Learning

Journal, 20(3).

Le Goc, M., Bouché, P., and Giambiasi, N., (2005).

Stochastic modeling of continuous time discrete event

sequence for diagnosis. Proceedings of the 16th

International Workshop on Principles of Diagnosis

(DX’05) Pacific Grove, California, USA.

Le Goc, M., (2006). Notion d’observation pour le

diagnostic des processus dynamiques: Application a

Sachem et a la découverte de connaissances

temporelles. Hdr, Faculté des Sciences et Techniques

de Saint Jérôme.

7 CONCLUSIONS

This paper shows that the “BJT4BN” algorithm is

efficient both in terms of pertinence, simplicity and

speed. These properties come from the BJ-measure

that provides an operational way to orient the edges

of Bayesian Network without the exponential CI

Tests of Cheng’s method. It is then an advantage of

using the time of the data to learn a dynamic

Bayesian network. Our current works are concerned

with the combination of the Timed Data Mining

techniques of the TOM4L framework with the

“BJT4BN” algorithm to define a global validation of

the TOM4L learning process.

Myers, J., Laskey, K., Levitt, T., (1999). Learning

Bayesian Networks from Incomplete Data with

Stochastic Search Algorithms.

Pearl, J., (1988). Probabilistic Reasoning in Intelligent

Systems: Networks of Plausible Inference. San Mateo,

Calif.: Morgan Kaufmann.

EFFICIENT LEARNING OF DYNAMIC BAYESIAN NETWORKS FROM TIMED DATA

231