Table 1: 1-tuple episodes of S.
1-tuple episode p List of occurrence windows
A [1,1], [2,2], [7,7]
B [3,3], [8,8]
C [4,4], [9,9]
E [6,6], [10,10], [12,12]
F [6,6], [10,10], [12,12]
EF [6,6], [10,10], [12,12]
prefix (the first event of the antecedent). The an-
tecedent is denoted as ant. In the encoded sequence,
each 1-tuple episode p is viewed as a prefix of the
antecedent of an episode rule to be built. Once the
prefix of an episode rule R is fixed, its occurrence
windows OW(S,t
s
,t
e
) are known. For example, let
minsupp = 2, A can be considered as a prefix of an
episode rule. The list of occurrence windows of A =
([1,1],[2,2], [7,7]) (see Table 1).
3.2.3 Consequent Identification
A candidate consequent of an antecedent ant (here,
ant corresponds to a unique element, the prefix) is
chosen in the windows Win(S,t
s
,w) where: ant ⊆ I
t
s
.
Recall we want to form episode rules with a conse-
quent as far as possible from the antecedent. Thus, the
candidate consequents are not searched in the entire
window, they are searched only in Win
end
: where the
farthest candidates are. We construct P
end
(ant), the
ordered list of 1-tuple episodes that occur frequently
in Win
end
from Win(S,t
s
,w) that contains ant.
Let p
j
∈ P
end
(ant) be a candidate consequent of
ant. The episode rule R : ant → p
j
is formed and
its support is computed (see Figure 3 line 7). At this
stage, the occurrence windows of the episode rule R
are filtered to get minimal occurrences as well as to
preserve the anti-monotonicity property. This filter-
ing is done by counting only once the occurrence win-
dows containing the same occurrence of the conse-
quent p
j
. For example, let w
begin
= 2, w
end
= 2 and
w = 6. The episode rule R : A → E has three occur-
rence windows: ([1,6],[2,6],[7,12]). In the two first
occurrence windows, the consequent E is a common
1-tuple episode which occurs at timestamp t
6
. There-
fore, we consider only the interval [2, 6]. However, all
occurrencewindowsare kept in memory, to be used to
complete the antecedent. This enables not losing any
of the interesting episode rules which could be missed
if we kept only the minimal occurrences in memory.
Next, we compute the support of the correspond-
ing episode rules (see Figure 3 line 7).
We define the support of a rule P → Q, referred
to as supp
end
(P.Q), as the number of minimal occur-
rence windows computed above. It is different from
the traditional supp(P.Q) as it considers only occur-
rence windows where P occurs in Win
begin
and Q oc-
curs in Win
end
.
If R : ant → p
j
is not frequent, we consider that
p
j
cannot be a consequent of ant. This iteration is
stopped and the rule is discarded. There is no need to
complete the antecedent of the rule R, as whatever the
events that complete the antecedent are, the resulting
rule will not be frequent. The algorithm will iterate
on another consequent. If R : ant → p
j
is frequent, its
confidence is computed. its confidence is computed.
We define the confidence of a rule P → Q (see
Equation (1)) as the probability that the consequent
occurs in Win
end
, given that P appears in Win
begin
.
conf(P → Q) =
supp
end
(P.Q)
supp(P)
(1)
If the rule R : ant → p
j
is confident, this rule is
added to the set of rules formed by the algorithm.
It is minimal and has a consequent far from the an-
tecedent; it fulfills our goal. If the rule is frequent but
not confident, the antecedent of the rule R : ant → p
j
is completed (as in the next subsection).
For example, let w = 6, w
begin
= 2 and w
end
= 2.
For the episode rule R with prefix A. P
end
(A) =
[E,F,EF,A]. We first try to construct the episode
R with the consequent E. Thus, for R : A →
E, supp(R) = 2 and conf(R) = 2/3 = 0.67. For
minsupp = 2 and minconf = 0.7, R is frequent but
not confident, so its antecedent has to be completed.
3.2.4 Antecedent Completion
In this step, the antecedent, referred to as ant, is iter-
atively completed with 1-tuple episodes, placed on its
right side in the limit of the predefined sub-window
Win
begin
. At the first iteration: ant is a unique ele-
ment (the prefix) (see Figure 3 line 15). Recall that
we aim at forming rules having the last event of the
antecedent as far as possible from the consequent, so
as close as possible of the prefix. Thus, we construct
P
begin
(ant), the ordered list of 1-tuple episodes that
occur frequently after ant in the windows that starts
with it: the 1-tuple episodes that occur in Win
begin
.
Similarly to the consequent identification step, the
occurrence windows of the episode rule R are fil-
tered to get minimal occurrences and to preserve the
anti-monotonicity property. In addition, we apply the
same support, confidence verifications as for the con-
sequent identification.
To speed up the episode rules mining process we
use a heuristic. We propose to order the list of candi-
dates P
begin
(ant) in descending order of the number of
Win
begin
in which the candidates appear. We assume
that this number is highly correlated with the support
KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
8