REFERENCES
this graph. Because each relation of the G graph
participates to the seven paths leading to the nodes
1006 and 1001, the L
1
list is equal to the L list of
Table 5. This table has been sorted so that the
minimal value m
ij
is at the end of the list. Its is then
easy to see that the relation r(1020, 1031) is the first
removed relation, before the relations r(1020, 1006)
and r(1004, 1001). The elimination of these three
relations is sufficient to build the final G graph of
Figure 4.
Benayadi, N., Le Goc, M., (2008). Discovering Temporal
Knowledge from a Crisscross of Timed Observations.
To appear in the proceedings of the 18th European
Conference on Artificial Intelligence (ECAI'08),
University of Patras, Patras, Greece.
Bouché, P., Le Goc, M., Giambiasi, N., (2005). Modeling
discrete event sequences for discovering diagnosis
signatures. Proceedings of the Summer Computer
Simulation Conference (SCSC05) Philadelphia, USA.
Cheeseman, P., Stutz, J., (1995). Bayesian classification
(Auto-Class): Theory and results. Advances in
Knowledge Discovery and Data Mining, AAAI Press,
Menlo Park, CA, p. 153-180.
1006
1001
1020
1031
10261004 10141002 1024
1006
1001
1020
1031
10261004 10141002 1024
Cheng, J., Bell, D., Liu, W., (1997). Learning Bayesian
Networks from Data An Efficient Approach Based on
Information Theory.
Figure 3: Initial G graph.
Cheng, J., Greiner, R., Kelly, J., Bell, D., Liu, W., (2002).
Learning Bayesian Networks from Data: An
Information-Theory Based Approach. Artificial
Intelligence, 137, 43-90.
P(1031) = 0,176
P(1004) = 0,026
P(1014 | 1002) = 0.385
P(1014 | ¬1002) = 0.024
P(1026 | 1014) = 0.400
P(1026 | ¬1014) = 0.010
P(1024 | 1026) = 0.059
P(1024 | ¬1026) = 0.025
P(1001 | 1031, 1002) = 0.088
P(1020 | 1031, ¬1002) = 0.053
P(1020 | ¬ 1031, 1002) = 0.053
P(1020 | ¬1031, ¬1002) = 0.048
P(1020 | 1024) = 0.120
P(1020 | ¬1024) = 0.033
P(1006 | 1001) = 0.385
P(1006 | ¬1001) = 0.037
1006
1001
1020
1031
10261004 10141002 1024
P(1031) = 0,176
P(1004) = 0,026
P(1014 | 1002) = 0.385
P(1014 | ¬1002) = 0.024
P(1026 | 1014) = 0.400
P(1026 | ¬1014) = 0.010
P(1024 | 1026) = 0.059
P(1024 | ¬1026) = 0.025
P(1001 | 1031, 1002) = 0.088
P(1020 | 1031, ¬1002) = 0.053
P(1020 | ¬ 1031, 1002) = 0.053
P(1020 | ¬1031, ¬1002) = 0.048
P(1020 | 1024) = 0.120
P(1020 | ¬1024) = 0.033
P(1006 | 1001) = 0.385
P(1006 | ¬1001) = 0.037
1006
1001
1020
1031
10261004 10141002 1024
1006
1001
1020
1031
10261004 10141002 1024
Chickering, D. M., Geiger, D., Heckerman, D., (1994).
Learning Bayesian Networks is NP-Hard. Technical
Report MSR-TR-94-17, Microsoft Research, Microsoft
Corporation.
Cooper, G. F., Herskovits, E., (1992). A Bayesian Method
for the induction of probabilistic networks from data.
Machine Learning, 9, 309-347.
Figure 4: Final G graph.
The CPT tables are computed using the N matrix
(table 1). This matrix provides all the information’s
to compute the probabilities of the root nodes of
figure 4. A handmade graph of the experts of the
Arcelor Group in 2003 can be find in (Bouché,
2005) and (Le Goc, 2005). The G graph is all
contained in the expert’s graph. But the expert’s
graph does not contain the 1024 class: corresponding
to an operator query for a chemical analysis, this
class has been removed by experts.
Friedman, N., (1998). The Bayesian structural EM
algorithm. Proceedings of the 14th Conference on
Uncertainty in Artificial Intelligence. Morgan
Kaufmann, San Francisco, CA, p. 129-138.
Heckerman, D., Geiger, D., Chickering, D. M., (1997).
Learning Bayesian networks: the combination of
knowledge and statistical data. Machine Learning
Journal, 20(3).
Le Goc, M., Bouché, P., and Giambiasi, N., (2005).
Stochastic modeling of continuous time discrete event
sequence for diagnosis. Proceedings of the 16th
International Workshop on Principles of Diagnosis
(DX’05) Pacific Grove, California, USA.
Le Goc, M., (2006). Notion d’observation pour le
diagnostic des processus dynamiques: Application a
Sachem et a la découverte de connaissances
temporelles. Hdr, Faculté des Sciences et Techniques
de Saint Jérôme.
7 CONCLUSIONS
This paper shows that the “BJT4BN” algorithm is
efficient both in terms of pertinence, simplicity and
speed. These properties come from the BJ-measure
that provides an operational way to orient the edges
of Bayesian Network without the exponential CI
Tests of Cheng’s method. It is then an advantage of
using the time of the data to learn a dynamic
Bayesian network. Our current works are concerned
with the combination of the Timed Data Mining
techniques of the TOM4L framework with the
“BJT4BN” algorithm to define a global validation of
the TOM4L learning process.
Myers, J., Laskey, K., Levitt, T., (1999). Learning
Bayesian Networks from Incomplete Data with
Stochastic Search Algorithms.
Pearl, J., (1988). Probabilistic Reasoning in Intelligent
Systems: Networks of Plausible Inference. San Mateo,
Calif.: Morgan Kaufmann.
EFFICIENT LEARNING OF DYNAMIC BAYESIAN NETWORKS FROM TIMED DATA
231