Episode Rules Mining Algorithm for Distant Event Prediction

Lina Fahed, Armelle Brun and Anne Boyer

Universit´e de Lorraine, LORIA,

Equipe KIWI, Campus Scientiﬁque, BP 239 54506, Vandoeuvre-l`es-Nancy Cedex, France

Keywords:

Data Mining, Episode Rules Mining, Minimal Rules, Distant Event Prediction.

Abstract:

This paper focuses on event prediction in an event sequence, where we aim at predicting distant events. We

propose an algorithm that mines episode rules, which are minimal and have a consequent temporally distant

from the antecedent. As traditional algorithms are not able to mine directly rules with such characteristics,

we propose an original way to mine these rules. Our algorithm, which has a complexity similar to that of

state of the art algorithms, determines the consequent of an episode rule at an early stage in the mining

process, it applies a span constraint on the antecedent and a gap constraint between the antecedent and the

consequent. A new conﬁdence measure, the temporal conﬁdence, is proposed, which evaluates the conﬁdence

of a rule in relation to the predeﬁned gap. The algorithm is validated on an event sequence of social networks

messages. We show that minimal rules with a distant consequent are actually formed and that they can be used

to accurately predict distant events.

1 INTRODUCTION

The ﬂow of messages posted in blogs and social net-

works is an important and valuable source of infor-

mation that can be analyzed, modeled (through the

extraction of hidden relationships) and from which in-

formation can be predicted, which is the focus of our

work. For example, companies may be interested in

the prediction of what will be said about them in so-

cial networks. Similarly, this prediction can be a way

to recommend items. We consider that the sooner an

event is predicted, the more useful this prediction is

for the company or the person concerned, since this

allows to have enough time to act before the occur-

rence of the event. Predicting distant events is thus

the focus of our work.

Temporal data mining is related to the mining

of sequential patterns ordered by a given criterion

such as time or position (Laxman and Sastry, 2006).

Episode mining is the appropriate pattern discovery

task related to the case the data is made up of a sin-

gle long sequence. An episode is a temporal pattern

made up of “relatively close” partially ordered items

(or events), which often appears throughout the se-

quence or in a part of it (Mannila et al., 1997). When

the order of items is total, the episode is said to be se-

rial. Similarly to the extraction of association rules

from itemsets, episode rules can be extracted from

episodes to predict events (Cho et al., 2011). The

rule mining task is usually decomposed into two sub-

problems. The ﬁrst one is the discovery of frequent

itemsets or episodes that have a support higher than a

predeﬁned threshold. The second one is the genera-

tion of rules from those frequent itemsets or episodes,

with the constraint of a minimal conﬁdence thresh-

old (Agrawal et al., 1993). In general, rules are gen-

erated by considering some items in the itemset (or

the last items in the episode) as the consequent of the

rule, and the rest of the items as the antecedent. Since

the second sub-problem is quite straightforward, most

of the researches focus on the ﬁrst one: the extraction

of itemsets or episodes.

Episode and episode rules mining are used in

many areas, such as telecommunication alarm man-

agement (Mannila et al., 1997), intrusion detec-

tion (Luo and Bridges, 2000), discovery of relation

between ﬁnancial events (Ng and Fu, 2003), etc.

Our goal is to reliably predict events that will oc-

cur after a predetermined temporal distance, in order

to have enough time to act before the occurrence of

events. Therefore, serial episode rules with a conse-

quent distant from the antecedent will be mined. Tra-

ditional episode rules mining algorithms form episode

rules with a consequent close to the antecedent. To

mine rules with a distant consequent, these algorithms

have to perform a post-processing step: the extracted

rules are ﬁltered to keep only rules with a consequent

that may occur far from the antecedent. This post-

Fahed L., Brun A. and Boyer A..

Episode Rules Mining Algorithm for Distant Event Prediction.

DOI: 10.5220/0005027600050013

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR-2014), pages 5-13

ISBN: 978-989-758-048-2

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

processing is not only time consuming, but also the

time required for mining rules which will be ﬁltered

out is useless. Thus, we propose a new algorithm for

mining serial episode rules, with a consequent tem-

porally distant from the antecedent, to predict dis-

tant events, and with a small antecedent (in number

of events and in time), to be able to predict events

as soon as possible. This algorithm has a complexity

equal to that of traditional algorithms.

An example of a required rule is presented be-

low (from a sequence of annotated messages of blogs

about ﬁnance issues, where each event includes a sen-

timent polarity): R: (interest rate, neutral) (credit,

negative) (waiting, neutral) → (concurrence, nega-

tive); the antecedent occurs within 5 days, the gap

between the antecedent and the consequent is 15 days.

The rest of this paper is organized as follows: sec-

tion 2 presents related works about episode rules min-

ing. Our algorithm is introduced in section 3, fol-

lowed by experimental results in section 4. We con-

clude and provide some perspectives in section 5.

2 RELATED WORKS

We ﬁrst start by introducing few concepts. Let I =

,...,i

} be a ﬁnite set of items. I

is the set of

items that occur at a timestamp t referred to as an

event. An event sequence S is an ordered list of

events, S =< (t

),(t

),...,(t

) > with t

< ... < t

, (see Figure 1). The serial episode P =<

, p

,..., p

> on I

is an ordered list of events. Its

support, denotedto as supp(P), representsthe number

of occurrences of P, according to a frequency mea-

sure. P is said to be a frequent episode if supp(P) ≥

minsupp where minsupp is the predeﬁned minimal

threshold. An occurrence window of the episode P

is a segment < I

,...,I

> of the sequence, denoted as

OW(S,t

) which starts at timestamp t

and ends at

timestamp t

, where P ⊆< I

,...,I

>, p

⊆ I

and

⊆ I

. It represents the interval that bounds the

episode. Let P,Q be two episodes. An episode rule

R : P → Q means that Q appears after P. The con-

ﬁdence of this episode rule is the probability to ﬁnd

Q after P: conf(P → Q) = supp(P.Q)/supp(P). The

rule is said to be conﬁdent if its conﬁdence exceeds a

predeﬁned threshold minconf .

Winepi and Minepi are seminal episode mining

algorithms (Mannila et al., 1997), and are the basis

of many other algorithms. To extract episodes, both

algorithms start by extracting 1-tuple episodes (made

up of one item), then iteratively seek larger ones by

merging items on their right side. This approach is

the one used by many other algorithms (Laxman et al.,

2007) (Huang and Chang, 2008).

In the frequent itemsets mining task (which in-

cludes episode mining), it is suggested that anti-

monotonicity is a common property that has to be re-

spected by any frequency measure (Agrawal et al.,

1993). Several frequency measures for episode min-

ing have been proposed. In (Mannila et al., 1997),

window-based and minimal occurrence-based fre-

quency measures have been introduced through both

Winepi and Minepi. Winepi evaluates the frequency

of an episode as the number of windows of length

w that contain the episode. Minepi evaluates the

frequency of an episode as the number of minimal

windows that contain the episode. A minimal win-

dow is a window such that no sub-window con-

tains the episode. The non-overlapped occurrence-

based frequency measure is proposed (Laxman et al.,

2007), where two occurrences of an episode are non-

overlapped if no item of one occurrence appears in

between items of the other. It is shown that the non-

overlapped occurrence-based algorithms are much

more efﬁcient in terms of space and time needed.

When mining serial episodes, additional constraints

on the episodes may be imposed. Such as the span

constraint (Achar et al., 2013), that imposes an up-

per bound (of distance or time) between the ﬁrst and

last event in the occurrence of an episode. This con-

straint has been mainly introducedfor complexityrea-

sons. Another constraint is the gap constraint (M´eger

and Rigotti, 2004), which imposes an upper bound

between successive events in the occurrence of an

episode. If the extracted serial episodes have to repre-

sent causative chains then such constraints are impor-

tant.

Traditional episode rules mining algorithms con-

struct episode rules with a large antecedent (made up

of many events) (Pasquier et al., 1999). Discovering

rules with a small antecedent was introduced for as-

sociation rules, called “minimal association rules dis-

covery” (Rahal et al., 2004). Minimal rules are con-

sidered as sufﬁcient (as no new knowledge is given

by larger ones). The constraint is that the consequents

are ﬁxed in advance by the user. Minimal rules have

also been studied with the aim to reduce time and

space complexity of the mining task, as well as to

avoid redundancy in the resulting set of rules (Neeraj

and Swati, 2012). These works focus on association

rules; recall we want to form episode rules.

Mining episodes in an event sequence is a task

which has received much attention (Gan and Dai,

2011). In an event sequence each data element may

contain several items (an event). In (Huang and

Chang, 2008), an algorithm Emma, is presented,

where the event sequence is encoded with frequent

KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

1 2 3 4 5 6

8 9 10 11 12

A B

D F

A B

Figure 1: Example of an event sequence S.

itemsets then serial episodes are mined. In (Gan

and Dai, 2011), episodes are ﬁrst extracted, then non-

derivable episodes rules are formed (where no rule

can be derived from another).

3 THE PROPOSED ALGORITHM

3.1 Principle

Our goal is to form episode rules that can also be used

to efﬁciently predict distant events.

To achieve this goal, the episode rules formed

have to hold the characteristic that the consequent is

temporally distant from the antecedent. Traditional

algorithms are not designed to form such rules. Re-

call that they ﬁrst form episodes from left to right by

iteratively appending events temporally close to the

episode being formed (the minimal occurrence), in

the limit of the predeﬁned span. Then, the episode

rule is built by considering the last element(s) of the

episode as consequent of the rule. In addition, when

forming these episodes, it is impossible to know if

the event being appended will be part of the conse-

quent or not, so it is impossible to constrain its dis-

tance to other events while forming the episode. The

only way to mine rules with a consequent distant from

the antecedent is to mine all rules and then ﬁlter the

occurrences that do respect this distance. Due to the

limited span, this distance cannot be large.Thus, we

propose to mine episode rules without any episode

mining phase. To be able to constrain the distance be-

tween the antecedent and the consequent, we propose

to determine the consequent early in the mining pro-

cess. We think that, by determining the consequent at

an early stage, the occurrence windows of an episode

rule will be ﬁltered early, thus the search space will

be pruned, and no post-processing is required.

We aim also at predicting events early. We assume

that the more the antecedent of a rule is small in num-

ber of events and in time, the earliest it ends, and the

earliest the consequent can be predicted. Therefore,

we propose to extract episode rules that have an addi-

tional characteristic: an antecedent as small as possi-

ble (in number of events and in time), which we call

“minimal episode rules”. For that, we apply the tradi-

tional minimal occurrence-based frequency measure.

The required characteristics force us to propose an

approach totally different from traditional ones. We

propose an episode rules mining algorithm, where the

preﬁx (the ﬁrst event) of a rule is ﬁrst ﬁxed. Then,

the consequent is determined to constrain its distance

from the preﬁx. Finally, the antecedent is completed.

Before explaining our algorithm, we present new

concepts on which our algorithm relies.

Sub-windows of Win(S,t

,w): Let Win(S,t

,w)

be a window in the sequence S of length w that starts

at t

, with its ﬁrst element containing the preﬁx of an

episode rule (the ﬁrst event) to be built. In order to

mine episode rules with a distant consequent and a

minimal antecedent, we propose to split this window

into three sub-windows as follows (see Figure 2):

Win

begin

is a segment of Win(S,t

,w) of length

begin

< w, starting at t

. Win

begin

can be viewed as

an expiry time for the antecedent of an episode rule.

It represents the span of the antecedent of an episode

rule to guarantee that the antecedent occurs within a

determined time.

Win

end

is a segment of Win(S,t

,w) of length

end

< w, that ends at t

+ w. Win

end

represents the

time window of occurrence of the consequent.

Win

between

is the remaining sub-window of length

between

, in which neither the antecedent nor the con-

sequent can appear. Win

between

guarantees the tem-

poral distance between the antecedent and the conse-

quent of an episode rule. It represents a minimal gap

between the antecedent and the consequent to guaran-

tee that the consequent is far from the antecedent.

Win

begin

Win

between

Win

end

Figure 2: Sub-windows of Win(S,t

,w).

3.2 Steps of the Algorithm

We present now the different steps of our algorithm.

3.2.1 Initialization

Our algorithm starts by an initialization phase, which

reads the event sequence to extract all frequent events

and their associated occurrence timestamps. An event

represents a 1-tuple episode and will be denoted as

P. Table 1 presents the list of 1-tuple episodes of the

sequence S (Figure 1) and their associated occurrence

windows for minsupp = 2 (see Figure 3 line 2).

3.2.2 Preﬁx Identiﬁcation

Episode rules are built iteratively by ﬁrst ﬁxing the

EpisodeRulesMiningAlgorithmforDistantEventPrediction

Table 1: 1-tuple episodes of S.

1-tuple episode p List of occurrence windows

A [1,1], [2,2], [7,7]

B [3,3], [8,8]

C [4,4], [9,9]

E [6,6], [10,10], [12,12]

F [6,6], [10,10], [12,12]

EF [6,6], [10,10], [12,12]

preﬁx (the ﬁrst event of the antecedent). The an-

tecedent is denoted as ant. In the encoded sequence,

each 1-tuple episode p is viewed as a preﬁx of the

antecedent of an episode rule to be built. Once the

preﬁx of an episode rule R is ﬁxed, its occurrence

windows OW(S,t

) are known. For example, let

minsupp = 2, A can be considered as a preﬁx of an

episode rule. The list of occurrence windows of A =

([1,1],[2,2], [7,7]) (see Table 1).

3.2.3 Consequent Identiﬁcation

A candidate consequent of an antecedent ant (here,

ant corresponds to a unique element, the preﬁx) is

chosen in the windows Win(S,t

,w) where: ant ⊆ I

Recall we want to form episode rules with a conse-

quent as far as possible from the antecedent. Thus, the

candidate consequents are not searched in the entire

window, they are searched only in Win

end

: where the

farthest candidates are. We construct P

end

(ant), the

ordered list of 1-tuple episodes that occur frequently

in Win

end

from Win(S,t

,w) that contains ant.

Let p

∈ P

end

(ant) be a candidate consequent of

ant. The episode rule R : ant → p

is formed and

its support is computed (see Figure 3 line 7). At this

stage, the occurrence windows of the episode rule R

are ﬁltered to get minimal occurrences as well as to

preserve the anti-monotonicity property. This ﬁlter-

ing is done by counting only once the occurrence win-

dows containing the same occurrence of the conse-

quent p

. For example, let w

begin

= 2, w

end

= 2 and

w = 6. The episode rule R : A → E has three occur-

rence windows: ([1,6],[2,6],[7,12]). In the two ﬁrst

occurrence windows, the consequent E is a common

1-tuple episode which occurs at timestamp t

. There-

fore, we consider only the interval [2, 6]. However, all

occurrencewindowsare kept in memory, to be used to

complete the antecedent. This enables not losing any

of the interesting episode rules which could be missed

if we kept only the minimal occurrences in memory.

Next, we compute the support of the correspond-

ing episode rules (see Figure 3 line 7).

We deﬁne the support of a rule P → Q, referred

to as supp

end

(P.Q), as the number of minimal occur-

rence windows computed above. It is different from

the traditional supp(P.Q) as it considers only occur-

rence windows where P occurs in Win

begin

and Q oc-

curs in Win

end

If R : ant → p

is not frequent, we consider that

cannot be a consequent of ant. This iteration is

stopped and the rule is discarded. There is no need to

complete the antecedent of the rule R, as whatever the

events that complete the antecedent are, the resulting

rule will not be frequent. The algorithm will iterate

on another consequent. If R : ant → p

is frequent, its

conﬁdence is computed. its conﬁdence is computed.

We deﬁne the conﬁdence of a rule P → Q (see

Equation (1)) as the probability that the consequent

occurs in Win

end

, given that P appears in Win

begin

conf(P → Q) =

supp

end

(P.Q)

supp(P)

(1)

If the rule R : ant → p

is conﬁdent, this rule is

added to the set of rules formed by the algorithm.

It is minimal and has a consequent far from the an-

tecedent; it fulﬁlls our goal. If the rule is frequent but

not conﬁdent, the antecedent of the rule R : ant → p

is completed (as in the next subsection).

For example, let w = 6, w

begin

= 2 and w

end

= 2.

For the episode rule R with preﬁx A. P

end

(A) =

[E,F,EF,A]. We ﬁrst try to construct the episode

R with the consequent E. Thus, for R : A →

E, supp(R) = 2 and conf(R) = 2/3 = 0.67. For

minsupp = 2 and minconf = 0.7, R is frequent but

not conﬁdent, so its antecedent has to be completed.

3.2.4 Antecedent Completion

In this step, the antecedent, referred to as ant, is iter-

atively completed with 1-tuple episodes, placed on its

right side in the limit of the predeﬁned sub-window

Win

begin

. At the ﬁrst iteration: ant is a unique ele-

ment (the preﬁx) (see Figure 3 line 15). Recall that

we aim at forming rules having the last event of the

antecedent as far as possible from the consequent, so

as close as possible of the preﬁx. Thus, we construct

begin

(ant), the ordered list of 1-tuple episodes that

occur frequently after ant in the windows that starts

with it: the 1-tuple episodes that occur in Win

begin

Similarly to the consequent identiﬁcation step, the

occurrence windows of the episode rule R are ﬁl-

tered to get minimal occurrences and to preserve the

anti-monotonicity property. In addition, we apply the

same support, conﬁdence veriﬁcations as for the con-

sequent identiﬁcation.

To speed up the episode rules mining process we

use a heuristic. We propose to order the list of candi-

dates P

begin

(ant) in descending order of the number of

Win

begin

in which the candidates appear. We assume

that this number is highly correlated with the support

KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

of the corresponding episode rules. So, in the traver-

sal of this list, when we observe that candidates tend

to form infrequent episode rules (several consecutive

1-tuple episodes lead to infrequent episode rules), we

stop the traversal. We consider that the remaining

candidates in this list will lead to infrequent episode

rules. This heuristic is used to reduce the number of

iterations. Although this heuristic may discard inter-

esting rules, it allows to reduce the iterations thus to

increase the size of the span of the rule (Win).

For example, let minsupp = 2 and mincon f = 0.7,

for R : A → E, P

begin

(A) = [B,C,A]. The antecedent of

R is completed with B and forms the episode rule R :

A,B → E. Thus, supp(R) = 2 and con f(R) = 2/2 =

1. The episode rule R is now conﬁdent, the phase of

completing its antecedent is stopped.

3.3 Temporal Conﬁdence

Win

between

has been introduced so as to guarantee that

the consequent of an episode rule occurs in Win

end

after a w

begin

+ w

between

temporal distance from the

preﬁx of the episode rule. However, given an occur-

rence of the antecedent, the consequent may also oc-

cur closer to the antecedent, in the windowWin

between

which should affect the conﬁdence of the rule. This

information may be important in some applications.

In the example of social networks, the prediction of a

negative event allows the company to act so as to pre-

vent its occurrence. So, it is important to mine rules

with a consequent that never occurs in Win

between

Indeed, predicting a consequent at a given distance,

which may appear closer is useless, even danger-

ous. Consequently, we have to take into consideration

the occurrence of the consequent in Win

between

. We

introduce a new measure, the temporal conﬁdence,

which represents the probability that the consequent

occurs in Win

end

and only in it. For an episode rule

R : P → Q, this measure takes into account the sup-

port of P.Q when Q occurs only in Win

end

(and not in

Win

between

), denoted as supp

end

(P.Q). The temporal

conﬁdence is deﬁned as follows:

conf

(P → Q) =

supp

end

(P.Q)

supp(P.Q)

(2)

conf

(R) = 1 if for each occurrence of the consequent

of R in Win

end

, no occurrence of the consequent is

found in Win

between

. The temporal conﬁdence of rules

from the previous section is computed and the rules

with a temporal conﬁdence above minconf

are kept.

For example, let w = 6, w

begin

= 2 and w

end

= 2, the

temporal conﬁdence of the frequent conﬁdent episode

rule R : A → E depends on the occurrences of E in

Win

between

which is equal to 1 (E appears in the times-

tamp t

in Win

between

). Thus, con f

(R : A → E) =

1/2 = 0.5. For minconf

= 0.5, R is temporally con-

ﬁdent and is a rule formed by our algorithm.

Figure 3: Episode rules mining algorithm.

4 EXPERIMENTAL RESULTS

In this section, we evaluate our algorithm through

the study of the characteristics of the episode rules

formed, as well as its performance in a prediction

task.

4.1 Dataset

The dataset we use is made up of 27,612 messages

extracted from blogs about ﬁnance. Messages are an-

notated using the Temis

software. Each message is

represented by its corresponding set of annotations

(items). For example, the message: “Not only my

http://www.temis.com

EpisodeRulesMiningAlgorithmforDistantEventPrediction

Table 2: 1-tuple episodes : length and support.

1-tuple episode

length (#items)

number(%)

support

min max mean median

1 498(76.4) 30 797 149.7 89.5

147(22,5) 30 376 71.5 55

7(1) 32 54 44.4 46

bank propose the best, but also online banks, so how

to optimize my savings? accounts in other banks? life

insurance? i need more info.” is annotated with the

following items : {(online banks, negative), (savings,

neutral), (life insurance, neutral ), (needs, neutral)} ,

where each item is associated with an opinion degree.

In this dataset, messages are annotated with 4.8

items on average, ranging from 1 to 50 and the me-

dian is 4. There are about 4,000 distinct annotations

(items), with an average frequency of 88.5. 1,981 of

these items have a frequency equal to 1. These items

will be automatically ﬁltered out in the initialization

phase of our algorithm.

4.2 Characteristics of the Resulting

Rules

4.2.1 Initialization Phase

In this phase, frequent 1-tuple episodes are extracted

(1-tuple episode is made up of one or more items).

We ﬁx minsupp for these 1-tuple episodes to 30. This

phase results in 652 frequent 1-tuple episodes. Ta-

ble 2 shows that the 1-tuple episodes are made up of

one to three items only. 1-tuple episodes are mainly

made up of one item (76% of them), which also have

a high support (on average 149.7).

In order to study the episode rules formed, we

make vary minsupp, minconf and w

between

one at a

time, while ﬁxing others.

4.2.2 Making Vary minsupp

In Figure 4, we make vary minsupp from 10 to 35

to study the number of rules. Other parameters re-

main ﬁxed: minconf = 0.4, w = 40 (with w

begin

20, w

between

= 10 and w

end

= 10). As expected, the

smaller minsupp, the higher the number of rules.

When minsupp is ﬁxed to 10, the number of rules

is high (about 10

). Recall that the number of 1-

tuple episodes is only 652. However, the number

of rules is dramatically decreased when minsupp is

increased: only 1,200 when minsupp = 15 and 250

when minsupp = 20. These low values are due to the

small average frequency of 1-tuple episodes (about

88). In addition, these rules represent a temporal de-

pendence between the antecedent and the consequent

which, in these experiments, is at least w

between

= 10.

This may explain the low number of rules. A thor-

ough study shows that their average conﬁdence in-

creases with minsupp.

4.2.3 Making Vary minconf

We make vary minconf from 0.1 to 0.5. Figure 5

presents the number of episode rules according to the

value of minconf (minsupp = 20, w = 40 (w

begin

20, w

between

= 10 and w

end

= 10)). The number of

rules is particularly high with minconf = 0.2. This is

explained by the way our rules are formed. When the

conﬁdence of a rule does not exceed mincon f, its an-

tecedent is extended. Given an antecedent ant, events

in P

begin

(ant) (up to 652) are appended to it, resulting

in a large number of candidate rules. Some of them

are conﬁdent, which explains the increase of the num-

ber of rules. When minconf exceeds 0.2, the number

of rules decreases.

Table 3 presents the length of the antecedent of

the rules (in number of events), according to minconf.

The maximum length of an antecedent is three. Thus,

our algorithm, forms rules with a small antecedent,

which was one of its goals. The average length of the

antecedent increases with minconf: when minconf =

0.1, most of the rules have an antecedent of length

1, whereas when minconf = 0.3, most of the rules

have an antecedent of length 2. This was expected,

as minimal antecedents are searched. Indeed, when a

frequent rule has a conﬁdence below minconf, its an-

tecedent is extended,till it is conﬁdentor not frequent.

So, the higher minconf, the larger the antecedents. A

thorough study shows that the average length of oc-

currence windows of the antecedents is 8 timestamps

(for antecedents of length 2 and 3), which is smaller

than the span of the antecedent (w

begin

). We conclude

that our algorithm succeeds in forming rules with a

distant consequent, a small antecedent (in length and

time) and a relatively high conﬁdence.

Table 3: Antecedent length when making vary minconf .

minconf #Rules

%Rules

ant 1 ant 2 ant 3

0.1 16,850 57.2 42.84 0

0.2

31,972 4.2 95.6 0.2

0.3

5,072 0.9 97.5 1.6

0.4

251 0.8 90.9 8.3

0.5

9 0 100 0

4.2.4 Making Vary w

between

We now focus on the number of rules formed, accord-

ing to w

between

(w and w

begin

remain ﬁxed), presented

KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

0.2

0.4

0.6

0.8

·10

minsupp

#Rules

Figure 4: number of rules (10

) vs. minsupp.

0.1 0.2 0.3 0.4

0.5

·10

mincon f

#Rules

Figure 5: number of rules (10

) vs. minconf.

in Figure 6 where minsupp = 20 and minconf = 0.4.

Two values of w are studied: w = 40 (with w

begin

20) and w = 100 (w

begin

= 20). Notice that the cases

between

= 0 represent similar cases than state of the

art. We note that the larger w

between

, the smaller the

number of rules. Two reasons may explain this de-

crease. First, when w

between

increases, w

end

(the win-

dow in which the consequent is searched) decreases,

as well as the number of consequents studied. Sec-

ond, the larger w

between

, the more distant the conse-

quent, thus the lower the probability of having a de-

pendence between the antecedent and the consequent.

However, even with a large value of w

between

, some

rules are formed: 210 rules when w

between

= 70. We

conclude that there is actually a temporal dependence

between messages in blogs. When the minimal dis-

tance between the antecedent and the consequence is

50, more than 140k conﬁdent rules are formed: there

is a strong dependence between messages with such

a distance. For example, when w = 100 an episode

rule: (price, positive), (information, positive) → (buy,

positive), means that when someone talks about the

price of an article then asks for information, he/she

will buy this article after some time. Thus, we have

time to recommend him similar articles or to propose

to him a credit to buy it.

0 10 20 30 40

50 60

·10

between

#Rules

w = 40

w = 100

Figure 6: number of rules (10

) vs. w

between

Inﬂuence of minconf

: In this section we study

the temporal conﬁdence of the resulting rules. Table 4

presents the evolution of the temporal conﬁdence ac-

cording to w

between

. We remark that the smaller

between

, the higher the temporal conﬁdence: the con-

sequent does rarely occur between the antecedent and

the consequent when the gap between them is small,

which was expected. When w

between

= 30, the aver-

age temporal conﬁdence is 0.6, which is quite high. A

thorough study shows that among 2.7· 10

rules (see

Figure 6), about 40 ones have a temporal conﬁdence

equal to 1 (the consequent never occurs in Win

between

)

and 1,400 rules have a temporal conﬁdence higher

than 0.9 (the consequent appears in Win

between

in less

than 10% of the cases). This shows that in this dataset

there is a strongtemporal dependencebetween events,

and that some events are interdependent at a distance

of 30.

So, when exploiting the temporal conﬁdence as a

ﬁlter, a great number of rules remain.

Table 4: w

between

vs. Temporal conﬁdence (conf

between

min-conf

max-conf

mean-conf

median-conf

70 0 0.5 0.2 0.2

0.2 0.9 0.4 0.4

0.3 1 0.6 0.6

0.2 1 0.9 0.9

4.3 Performance

In this section we focus on the accuracy of the rules

formed when they are used to predict events, and we

perform a comparison of the rules with those of a tra-

ditional algorithm.

We evaluate the accuracy of our algorithm with

the traditional recall and precision measures. The

episode rules are trained on the ﬁrst 75% messages

and are tested on the 25% of messages left. Fig-

ure 7 presents the resulting precision and recall at

20. We ﬁx minsupp = 20, minconf = 0.4, w = 100

EpisodeRulesMiningAlgorithmforDistantEventPrediction

and w

begin

= 20. We make vary w

between

from 10 to

70. First of all, mention that two precision and re-

call values with two different values of w

between

are

not directly comparable as they are not computed on

the same data (the windows Win

end

, on which they

are computed vary in size). Both precision and re-

call curves decrease as w

between

increases. This was

expected as the number of rules decreases. When

between

= 70 (and w

end

= 10), both precision and re-

call values are quite low. This was expected as the

rules aim at predicting events distant to at least 70, in

an occurrence window of length w

end

=10. The prior

probability of predicting events accurately is low. Let

us now consider w

between

= 30, as in the previous sec-

tion. We can see that both precision and recall values

are quite high. When an event is predicted, in 37% of

the cases, it actually occurs and events that occur in

the sequence are predicted by our rules in 70% of the

cases.

10 20 30 40

50 60

0.2

0.4

0.6

0.8

between

precision-recall

precision@20

recall@20

Figure 7: precision, recall vs. w

between

Comparison with Minepi: Contrary to tradi-

tional algorithms, our algorithm forces a minimum

gap of length w

between

between the antecedent and the

consequent of an episode rule. Our algorithm can be

comparedto traditional algorithms when w

between

= 0.

Since we apply the minimal occurrence-based fre-

quency, we choose to compare it to the well-known

Minepi (Mannila et al., 1997). For minsupp = 20,

minconf = 0.4, w = 40, Minepi forms more than

136, 000 episode rules, whereas our algorithm (when

between

= 0) extracts about 40,000 episode rules

(70% less). This decrease is due to two reasons. First,

the constraint about the position of consequent of the

episode rules from our algorithm (in this case the dis-

tance between the antecedent and the consequent is

at least 20, even if w

between

= 0), makes the num-

ber of rules resulting from our algorithm lower (also

their support is lower). Second, our algorithm aims

at forming minimal rules, thus few rules with a large

antecedent are formed. We remark that 25% of the

rules extracted by Minepi have an antecedent larger

or equal to 3, whereas this rate is only 1.8% for our

algorithm.

Here an example of an episode rule extracted by

both our algorithm (w

between

= 0) and by Minepi:

(credit, positive), (consultant, positive) → (loan sub-

scription, positive).

Here is a rule that has not been extracted by our

algorithm, as it does not satisfy the desired character-

istics of episode rule (minimal antecedent): (consul-

tant, neutral), (interest rate, positive) → (request in-

terest rate 0, positive), where the antecedent occurs in

5 timetstamps and consequent occurs in the 7

times-

tamp. This rule is useful in traditional cases of event

prediction (prediction of close events). However, it

does not ﬁt our objective of early prediction of dis-

tant events, as the antecedent is so long both in time

and in number and the consequent is too close to the

antecedent.

Concerning the running time, our algorithm runs

5 times faster than Minepi when w = 100, and 4 times

faster when w = 40. This decrease is due to two fac-

tors. The ﬁrst one is related to the consequent, which

is ﬁxed at an early stage of the algorithm and which

allows to ﬁlter infrequent rules early in the process.

The second one is due to the fact that our algorithm

mines rules with a minimal antecedent, which avoids

some iterations once a conﬁdent rule is found.

5 CONCLUSION

In this paper, we propose an algorithm that mines

episode rules, in order to predict distant events. To

achieve our goal, the algorithm mines serial episode

rules with distant consequent. We determine several

characteristics of the episode rules formed: minimal

antecedent, and a consequent temporally distant from

the antecedent. A new conﬁdence measure, the tem-

poral conﬁdence, is proposed to evaluate the conﬁ-

dence on distant consequents. Our algorithm is eval-

uated on an event sequence of annotated social net-

works messages. We show that our algorithm is efﬁ-

cient in extracting episode rules with the desired char-

acteristics and in predicting distant events.

Since we use data from social networks, we aim to

use multi-thread sequences. This means that we con-

struct a sequence for each thread of messages: user

messages thread, topic messages thread and discus-

sion thread, etc. and the algorithm is run on each one.

Using multi-threadsequences allows to build more di-

verse episode rules which are all together more sig-

niﬁcant. The presence of a rule in several threads will

increase its conﬁdence.

KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

ACKNOWLEDGEMENTS

This research is supported by Cr´edit Agricole S.A.

REFERENCES

Achar, A., Sastry, P., et al. (2013). Pattern-growth based

frequent serial episode discovery. Data & Knowledge

Engineering, 87:91–108.

Agrawal, R., Imieli´nski, T., and Swami, A. (1993). Min-

ing association rules between sets of items in large

databases. In ACM SIGMOD Record, volume 22,

pages 207–216. ACM.

Cho, C.-W., Wu, Y.-H., Yen, S.-J., Zheng, Y., and Chen,

A. L. (2011). On-line rule matching for event predic-

tion. The VLDB Journal, 20(3):303–334.

Gan, M. and Dai, H. (2011). Fast mining of non-derivable

episode rules in complex sequences. In Modeling De-

cision for Artiﬁcial Intelligence. Springer.

Huang, K.-Y. and Chang, C.-H. (2008). Efﬁcient mining of

frequent episodes from complex sequences. Informa-

tion Systems, 33(1):96–114.

Laxman, S., Sastry, P., and Unnikrishnan, K. (2007). A

fast algorithm for ﬁnding frequent episodes in event

streams. In 13th ACM SIGKDD. ACM.

Laxman, S. and Sastry, P. S. (2006). A survey of temporal

data mining. Sadhana, 31(2).

Luo, J. and Bridges, S. M. (2000). Mining fuzzy associa-

tion rules and fuzzy frequency episodes for intrusion

detection. Int. J. of Intelligent Systems, 15(8):687–

703.

Mannila, H., Toivonen, H., and Verkamo, A. I. (1997). Dis-

covery of frequent episodes in event sequences. Data

Mining and Knowl. Discovery, 1(3):259–289.

M´eger, N. and Rigotti, C. (2004). Constraint-based mining

of episode rules and optimal window sizes. In PKDD

2004, pages 313–324. Springer.

Neeraj, S. and Swati, L. S. (2012). Overview of non-

redundant association rule mining. Research Journal

of Recent Sciences ISSN, 2277:2502.

Ng, A. and Fu, A. W.-C. (2003). Mining frequent episodes

for relating ﬁnancial events and stock trends. In Ad-

vances in Knowledge Discovery and Data Mining,

pages 27–39. Springer.

Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. (1999).

Discovering frequent closed itemsets for association

rules. In Database Theory-ICDT, pages 398–416.

Springer.

Rahal, I., Ren, D., Wu, W., and Perrizo, W. (2004). Mining

conﬁdent minimal rules with ﬁxed-consequents. In

16th IEEE ICTAI 2004.

EpisodeRulesMiningAlgorithmforDistantEventPrediction