• n
i, j
≥ n
i, j
⇒ p( j|i) ≥ 0.5. The C
i
plays the role of
a noisy class for the class C
j
.
These two conditions are both evaluated when
comparing the product p( j|i) · p(i| j) with
1
2
·
1
2
: when
p( j|i) · p(i| j) ≤
1
4
, M(C
i
,C
j
) ≤ 0.5 and the relation
R
i, j
(C
i
,C
j
) cannot be justified with the M-measure.
Inversely, when p( j|i) · p(i| j) >
1
4
, M(C
i
,C
j
) > 0.5
and the relation R
i, j
(C
i
,C
j
) has some interest from the
point of view of the M-measure. This leads to the fol-
lowing simple inducing rule that uses the M-measure
as interestingness criteria:
M(C
i
,C
j
) > 0.5 ⇒ R
i, j
(C
i
,C
j
) ∈ I (2)
So, the set I of induced binary relations contains only
two binary relations : I = {R
1,H
(C
1
,C
H
, [τ
−
1,H
, τ
+
1,H
]),
R
0,L
(C
0
,C
L
, [τ
−
0,L
, τ
+
0,L
])}
3.4 Deduction of n-ary Relations
The set I of binary relations contains then the minimal
subset of R where each relation R
i, j
(C
i
,C
j
) presents
a potential interest. From this set, the objective of
the deduction step consists to deduce from I a small
set M = {m
k−1,n
} of n-ary relations m
k−1,n
so that
a search algorithm can be used effectively to iden-
tify the most representative relations m
k−1,n
. To this
aim, an heuristic h(m
i,n
) is to select a minimal set
M = {m
k,n
} of n-ary relations of the form m
k,n
=
{R
i,i+1
(C
i
,C
i+1
)}, i = k, ··· , n− 1, that is to say paths
leading to a particular final observation class C
n
. The
heuristic h(m
i,n
) makes a compromise between the
generality and the quality of a path m
i,n
:
h(m
i,n
) = card(m
i,n
) × BJL(m
i,n
) × P(m
i,n
) (3)
In this equation, card(m
i,n
) is the number of relations
in m
i,n
, BJL(m
i,n
) is the sum of the BJL-measures
BJL(C
k−1
,C
k
) of each relation R
k−1,k
(C
k−1
,C
k
) in
m
i,n
and P(m
i,n
) is the product of the probabilities as-
sociated with each relation in m
i,n
.
P(m
i,n
) corresponds to the Chapmann-
Kolmogorov probability of a path in the transition
matrix P = [p(k − 1, k)] of the Stochastic Represen-
tation. The interestingness heuristic h(m
i,n
) being
of the form φ · ln(φ), it can be used to build all the
paths m
i,n
where h(m
i,n
) is maximum (Benayadi and
Le Goc, 2008). For the illustrative example, the
deduction step found a set M of two binary relations
(M = I)
1
.
1
no paths containing more than one binary relation can
be deduced from I
Cover Rate
number of target class occurrences predicted by the model
Total number of target class occurrences
TP True Positive Prediction
FP False Positive Prediction
Figure 3: Evaluation Measures for n-ary relation.
3.5 Find Representativeness n-ary
Relations
Given a set M = {m
k,n
)} of paths m
k,n
=
{R
i,i+1
(C
i
,C
i+1
)}, i = k, · ·· , n − 1, the fourth and
final step of the discovery process TOM4L, step Find,
uses two representativeness criterion (Cover Rate
and Anticipation Rate, figure 3) to build the subset
S ⊆ M containing the pathes m
k,n
being representative
according the initial set Ω of sequences. These paths
are called Signatures.
Generally, a threshold equal to 50% is used to
discard n-ary relations which have more false pre-
diction than correct prediction. For example, the
values of the cover rate and the anticipation rate of
both binary relations of M of the illustrative example
are 100%. So, S = M, S = {R
1,H
(C
1
,C
H
, [τ
−
1,H
, τ
+
1,H
]),
R
0,L
(C
0
,C
L
, [τ
−
0,L
, τ
+
0,L
])}.
These signatures are the only relations (patterns) that
are linked with the system y(t) = Fx(t). Compar-
ing with the set of patterns found by Apriori-like ap-
proaches, we can confirm from this illustrative ex-
ample that TOM4L approach converges towards a
minimal set of operational relations, which describe
the dynamic of the process. In the next section, we
present the application of TOM4L on a sequence gen-
erated by a very complex dynamic process, blast fur-
nace process. Due to the process complexity, we
can confirm, without experience, that Apriori-like ap-
proaches fail to mine this sequence.
4 APPLICATION
Our approach has been applied to sequences gener-
ated by knowledge-based system SACHEM devel-
oped to monitor, diagnose and control the blast fur-
nace (Le Goc, 2006). We are interested with the
omega variable that reveals the wrong management
of the whole blast furnace. The studied sequence
comes from Sachem at Fos-Sur-Mer (France) from
08/01/2001 to 31/12/2001. It contains 7682 occur-
rences of 45 discrete event classes (i.e. phenomena).
For the 1463 class linked to the omega variable, the
search space contains about 20
5
= 3, 200, 000 binary
relations. The inductive and the abductive reasoning
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence
432