TIME CONSTRAINTS EXTENSION ON FREQUENT SEQUENTIAL

PATTERNS

A. Ben Zakour, M. Sistiaga

2MoRO, Bidart, France

S. Maabout, M. Mosbah

Universit

e de Bordeaux, Bordeaux, France

Keywords:

Frequent pattern, Time constraint, Relax constraint, Sequential pattern mining.

Abstract:

Unlike frequent sets extraction for which only minimum support condition must be met, sequential patterns

satisfy time constraints. Commonly, to consider two events as successive, these constraints are either to respect

minimum and maximum time gap or to be included into a window size. In this paper, we introduce a new

deﬁnition of “interesting sequences”. This property suggests that temporal patterns, introducing concepts of

sliding window, can be customized by the user so that the events chronology in the extracted sequences has not

to strictly obey to the original event sequence.This deﬁnition is incorporated in the process of a conventional

algorithm (Fournier-Viger et al., 2008). The extracted patterns have an interval time stamp form and represent

an interesting palette of the original data.

1 INTRODUCTION

For Sequential Patterns Mining (SPM), the usage of

the temporal factor depends on the needs of this com-

ponent in the results. Existing extraction techniques

consider only the space component, the “succession”

of events in a sequence. In some cases, results can be

“exuberant” e.g., during the analysis of the shopping

basket, suppose that one customer buys product A and

product B a day later, and another one buys product A

and product B one month later. Based on these two

sequences, the pattern extracted is “a customer who

buys product A, buys the product B”. This conclu-

sion is not necessarily representative for the second

customer, and the extracted information is not repre-

sentative of the baseline data. Taking into account

the temporal aspect in the SPM was introduced in

(Srikant and Agrawal, 1996). Their constraints aim

to (1) bringing together “close” events into an indi-

vidual transaction and consider them as simultaneous

and (2) regulating the minimal and maximal time gaps

between two successive transactions. Considering the

importance of the temporal component in the inter-

pretation of patterns, (Hirate and Yamana, 2006) in-

troduce temporal constraints to extract temporal pat-

terns. They apply some improvements to Preﬁxspan

(Pei et al., 2004): (1) an interval function to upgrade

the timestamps of the bearing, (2) set the maximal and

minimal time gaps between two successive transac-

tions (3) min

whole interval and max whole interval

to regulate the minimum and maximum temporal

spread of a sequence. Both of these works intro-

duce time constraints to be satisﬁed by the extracted

sequences. One should notice that in both cases,

the extracted sequences respect the chronological or-

der that appears in the underlying data. However,

in some applications, close chronological ordering

is not necessary important for events. For example

the two temporal sequences h(0, A)(1, B)(2, C)i and

h(0, A)(1, C)(2, B)i may represent the same informa-

tion, that is A, B and C occur close to each other in

the interval [0,2]. The two algorithms presented pre-

viously do not consider this kind of information.

In this paper, we present a new deﬁnition of in-

teresting patterns. The idea is that we apply tempo-

ral relaxation on itemsets by using a backward sliding

window size constraint. Such a constraint can take

into account all neighbor events in a time window as

simultaneous.

The paper is organized as follows. First, we

present a brief state of the art. Then, we give a new

deﬁnition of interesting sequences. The third section

281

Ben Zakour A., Sistiaga M., Maabout S. and Mosbah M..

TIME CONSTRAINTS EXTENSION ON FREQUENT SEQUENTIAL PATTERNS.

DOI: 10.5220/0003098702810287

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR-2010), pages 281-287

ISBN: 978-989-8425-28-7

 2010 SCITEPRESS (Science and Technology Publications, Lda.)

describes a modiﬁcation of the algorithm presented

in (Fournier-Viger et al., 2008) to extract these se-

quences. A short evaluation of our approach is illus-

trated,a conclusion and perspectives in the last sec-

tion.

2 RELATED WORK

In this section, we present some general deﬁnitions

and two SPM works are detailed.

2.1 Terminology

A transaction is a timestamped itemset (or event

set). It is denoted by I

= {i

, i

, . . . , i

}. time(I

) de-

notes the timestamp of I

. A temporal sequence is a

timestamp ordered sequence of itemsets. It is denoted

by S = h(t

, I

), (t

, I

), . . . , (t

, I

)i where I

is a trans-

action and t

= time(I

). Hereafter, the timestamp t

related to the occurrence time of the ﬁrst transaction

of the sequence. Thus, it represents the time lag be-

tween the transaction I

and I

. A sequence database

is a collection of sequences where each sequence has

a unique identiﬁer id sequence. The support of a

sequence s in a sequence database SDB, denoted by

support

SDB

(s) is the percentage of sequences that

contain s in SDB. Given a minimal support minsupp,

s is frequent in SDB iff support

SDB

(s) ≥ minsup. Be-

sides minsupp and depending on the business needs,

the user may set time constraints that should be satis-

ﬁed by the extracted patterns. The time constraints we

consider are: mingap and maxgap: represent respec-

tively the minimal and maximal time gap between two

successive transactions: time(I

i+1

) − time(I

) ≥ mingap

, time(I

i+1

) − time(I

) ≤ maxgap

Example 1. Let s = h(0, I

)(1, I

)(10, I

)i. If min-

gap=2 then transactions I

and I

are not considered

as successive because they are “too close”. If max-

gap is set to 5 then I

and I

are not considered as

successive because they are too distant.

min whole interval and max whole interval: rep-

resent respectively minimal and maximal whole

interval constraints. Let n be the number

of itemsets in a sequence. Then: time(I

) −

time(I

) ≥ min whole interval, time(I

) − time(I

) ≤

max whole interval

Example 2. Let min whole interval=1,

max whole interval=4 and s = h(0, I

)(1, I

)i. s

satisﬁes the min whole interval constraint and the

max whole interval constraint.

Window size: it allows to consider events (items)

in different transactions, such as simultaneous (within

a single transaction). These transactions should be

relatively close to each other regarding to the size of

the window.

Example 3. Let T = (0, AB) and s =

h(0, A)(2, B)(3, C)i. If ws=2 then s does contain

T while if window = 1, T is not contained in s.

2.2 GSP

In (Srikant and Agrawal, 1996), the authors improved

their A Priori algorithm (Agrawal and Srikant, 1994)

to bearing the absence of time constraint on timeless

patterns extracting. This algorithm relaxes transaction

deﬁnition by using a window notion and by integrat-

ing mingap and maxgap constraints.

Deﬁnitions. Let d = hd

...d

i and s = hs

...s

be two sequences. d contains the sequence s with con-

straint if and only if: Window size constraint : There

exist integers l

 u

 l

 u

. . . l

 u

such that:

⊆ ∪

k=l

, 1 6 i 6 n and time(d

) − time(d

) ≤

window size, 1 6 i 6 n mingap and maxgap con-

straints : There exist l

 u

 l

 u

. . . l

 u

such

that: s

⊆ ∪

k=l

, 1 ≤ i ≤ n, time(d

)−time(d

i−1

) ≥

mingap, for 2 ≤ i ≤ n and time(d

) − time(d

i−1

) ≤

maxgap, for 2 6 i 6 n. time(d

) is s

start time,

time(d

) refer to s

end time.

Algorithm Description. The main goal is to ﬁnd

all frequent sequences satisfying all user constraints.

GSP is a level wise algorithm: ﬁrst, it recovers L

the

set of frequent 1-sequences. It generates the candi-

dates sequences of size k + 1 by self-joining L

k−1

2.3 GSPM

(Hirate and Yamana, 2006) presents an approach for

extracting frequent temporal sequences from a tem-

poral sequences database. The algorithm applies time

constraints different from those applied in GSP. The

main goal is to extract temporal frequent patterns

from sequence database by integrating a new time

constraint that align interval timestamps into a same

value. The algorithm is an improvement of PreﬁxS-

pan (Pei et al., 2004) where data-sequences may be

either timestamped or just sorted.

Deﬁnition. Let f be a time function. It maps time

intervals to integers. Let f be deﬁned as follows:

f (x) =











i f x ∈ [val

, val

[

is x ∈ [val

, val

[

...

n−1

i f x ∈ [val

, val

n+1

[

Let a = h(t

, X

), (t

, X

), . . . , (t

, X

)i and b =

h(t

, X

), (t

, X

), . . . , (t

, X

) be tow time sequences.

KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval

282

b is a subsequence of a w.r.t f , if and only if there

exist 1 ≤ i

 i

...  i

≤ m such that

• X

⊆ X

for all 1 ≤ i ≤ n and

• f (t

) = f (t

)

Let a sequence data base SDB, a sequence α =

h(t

1,1

, a

), (t

1,2

, a

), . . . , (t

1,m

, a

)i and X

an item-

set. If exist an Integer j (1 ≤ j ≤ m) as X

⊂ a

and I(t

1,β

) = I(t

1, j

), than the of α preﬁx regards

to X

, I(t

1,β

) is deﬁned by: pre f ix(α, X

, I(t

1,β

)) =

h(t

1,1

, a

), (t

1,2

, a

)..., (t

1, j

, a

The α sufﬁx regards to X

, I(t

1,β

) repre-

sents sequence events those occurs after X

It’s deﬁned by the: su f f ixe(α, X

, I(t

1,β

)) =

h(t

j, j

, a

), (t

j, j+1

, a

j+1

), . . . , (t

j,m

, a

)i.

Algorithm Description. The approach is a Preﬁxs-

pan extension (Pei et al., 2004). The ﬁrst step is

to recover the frequent 1-sequences. Then, longer

patterns are extracted through a patterns growth pro-

cess. They are recovered by using a projection pro-

cess to discover, for each build patterns, the possible

continuations on the concerned set of the SDB. The

SDB projection on a α pattern is denoted by SDB|α

and deﬁned by the equation: SDB|α = {is ∈ BD|is =

su f f ix(γ, 0, I)} avec γ ∈ SDB Iteration are stopped

when there is no possible continuation or no more fre-

quent items.

These two works extract frequent patterns by intro-

ducing time constraints. They relax the classical

transaction deﬁnition to better represent baseline data.

Relation is introduced through the application of the

grouping window size (Srikant and Agrawal, 1996)

and the temporal function in (Hirate and Yamana,

2006). Although those relaxation methods, it is im-

possible to associate unordered events.

In both methods, with a window size equals to

2 or with an equivalent time function, the se-

quence h(0, ABC)i does not contain the pattern

h(0, B)(1, AC)i. The algorithms look for item by item

and apply the projection, they keep as a continuation

only events that follow current item. In next section,

we present our approach which allows to take into ac-

count this kind of data.

3 INTERESTING SEQUENCES:

DEFINITIONS

In this paper, we propose to deﬁne a new type of inter-

esting sequences extracted from a temporal sequences

database. The difference with patterns extracted in

works presented, is the relaxation of transaction def-

inition without taking into account the order of items

within itmsets.

An “interesting sequence” is denoted by sp =

h(δt

, I

), (δt

1,2

, I

). . . , (δt

1,m

, I

)i. where I

is an

itemset . δt

1, j

is a transaction time stamp, it is the tem-

poral interval in witch I

events occur. This interval is

characterized by min time (respectively max temp),

its lower (resp. upper) bound. δt

1, j

is a relative times-

tamps w.r.t to I

occurrence where: min time and

max time are respectively the minimum and maxi-

mum time at which the events I

may occur after those

of I

Example 4. Let the sequence h(0, A)([2, 3], B)i

means that B occurs at earlier 2 times after A and

at the latest 3 times after A.

We consider the following time constraints:

• mingap and maxgap:

min(δt

i+1

) − max(δt

) ≥ mingap

max(δt

i+1

) − min(δt

) ≤ maxgap

• min whole interval and max whole interval:

min(δt

) − max(δt

) ≥ min whole interval

min(δt

) − max(δt

) ≤ max whole interval

• Window size:

max(δt

) − min(δt

) ≤ ws

Let s = h(t

, I

), . . . , (t

, I

)i and s

h(t

, s

), . . . , (t

, s

)i. s contains s

, denoted

s |= S

if and only if: For all 1 ≤ i ≤ m

⊂ (

u=l1

), s

⊂ (

u=l2

), . . . , s

⊂ (

u=lm

)

max(time(

u=li

)) − min(time(

u=li

) ≤ ws

time(s

) ∈ [min(time(

u=li

)), max(time(

u=li

)]

min(time(

li+1

u=li+1

)) − max(time(

u=li

)) ≥ 0

Example 5. Let s

= h(0, A)(1, C)(2, B)(5, CDE)

(6, F)i, s

= h(0, AB)(5, FD)i and s

h(0, A)(12, CD)i. If window size ws is equal to

2, then s

contains s

. If mingap = 4 then s

does not

contain s

. Finally if maxgap = 10 then s

does not

contain s

: the gap between the maximal timestamps

of (CD) and the minimal time stamp of (AB) in s

greater than maxgap (12 ≥ 10).

This deﬁnition allows to extract patterns that can’t

be considered frequent by classical constraint. They

group, besides classical frequent patterns, those

grouping frequent unordered events occurring beside

a window interval.

4 COMPUTING INTERESTING

SEQUENCES

Problem Deﬁnition. Given a sequences

database, a support threshold minsupp, a window

TIME CONSTRAINTS EXTENSION ON FREQUENT SEQUENTIAL PATTERNS

283

size ws and time constraints: mingap, maxgap,

min whole interval and max whole interval, ﬁnd all

interesting sequences.

Algorithm Description. The calculation of “inter-

esting” sequences is implemented through a modiﬁ-

cation of the algorithm SPMF (Fournier-Viger et al.,

2008). This is an improvement of the algorithm Pre-

ﬁxSpan (Pei et al., 2004). Initially, the algorithm ex-

tracts frequent items I from the sequences database.

This provides the set L

= {s

= (0, I)|support(I) ≥

minsupp}. Then, for providing continuations of each

1-sequence, SDB is projected onto each 1-sequence

(0, I). This projection is intended to take into account

the relaxation introduced by the window. The pro-

jection considers possible continuation of an event

that occurred at a time t. It backwards to event

occurring in the interval [ws − t, t] concatenated to

the “classical” continuation. This interval allows

to consider as possible “simultaneous” continuation

events appearing before the current event. Let α =

h(t

1,1

, a

), . . . , (t

1,m

, a

)i and a pair (δt, X

). If there

exists j such that 1 ≤ j ≤ m and X

⊂ a

and t

1,β

∈ δt

then the α preﬁx with regards to X

, t

1,β

is deﬁned as:

wpre f ixe(α, X

, t

1,β

) =

h(t

1,1

, a

), (t

1,2

, a

), . . . , (t

1, j

, a

The sufﬁx of α wrt X

, I(t

1,β

) is deﬁned as:

If j = 1 then wsu f f ix(α, X

, t

1,β

) = h(t

j, j

, a

), (t

j, j+1

, a

j+1

), . . . , (t

j,m

, a

)i, else it is equal to

h(t

j,k

, a

), . . . , (t

j, j

, a

\ X

), (t

j, j+1

, a

j+1

). . . , (t

j,m

, a

)i1 ≤

k ≤ m as t

≤ ws and t

j(k−1)

≥ ws.

If j = 1 the projection of SDB on α = h(0, I)i is

deﬁned by:

SDB|α = {ss = su f f ix(γ, I, ws)}, γ ∈ SDB

otherwise

SDB|α = {ss = wsu f f ix(γ, I, ws)}, γ ∈ SDB

The database projection on an item is described in Al-

gorithm 1. On each projection, frequent pairs are cal-

culated by using Find Frequent Pairs function. A

pair (δt, I) is a combination of a temporal interval and

an item. The interval is the period in which the item

occurs in the projection. The depth of this interval is

at most equals to the window size. Once a frequent

pair (δt, i) is identiﬁed, it is concatenated to the last

generated pattern. If the resulting pattern satisﬁes the

time constraints, then it is a new frequent sequence.

This new pattern generates a new iteration: the pro-

jection (SDB|α)|(δt, i) is computed and becomes the

new research space of frequent pairs.

Example 6. Let us consider example 5. The

projections of s

and s

w.r.t. h(0, A)i provide

resp. (SDB|

(0,A)

): s

: h(1, C)(2, B)(5, CDE)(6, F)i,

: h(0, B)(5, DF)i where the frequent pairs are:

([0, 2], B), ([5, 5], D), and ([5, 6], F).

size ws and time constraints: mingap, maxgap,

min whole interval and max whole interval, ﬁnd all

interesting sequences.

Algorithm Description. The calculation of “inter-

esting” sequences is implemented through a modiﬁ-

cation of the algorithm SPMF (Fournier-Viger et al.,

2008). This is an improvement of the algorithm Pre-

ﬁxSpan (Pei et al., 2004). Initially, the algorithm ex-

tracts frequent items I from the sequences database.

This provides the set L

= {s

= (0, I)|support(I) ≥

minsupp}. Then, for providing continuations of each

1-sequence, SDB is projected onto each 1-sequence

(0, I). This projection is intended to take into account

the relaxation introduced by the window. The pro-

jection considers possible continuation of an event

that occurred at a time t. It backwards to event

occurring in the interval [ws − t, t] concatenated to

the “classical” continuation. This interval allows

to consider as possible “simultaneous” continuation

events appearing before the current event. Let α =

h(t

1,1

, a

), . . . , (t

1,m

, a

)i and a pair (δt, X

). If there

exists j such that 1 ≤ j ≤ m and X

⊂ a

and t

1,β

∈ δt

then the α preﬁx with regards to X

, t

1,β

is deﬁned as:

wpre f ixe(α, X

, t

1,β

) =

h(t

1,1

, a

), (t

1,2

, a

), . . . , (t

1, j

, a

The sufﬁx of α wrt X

, I(t

1,β

) is deﬁned as:

If j = 1 then wsu f f ix(α, X

, t

1,β

) = h(t

j, j

, a

), (t

j, j+1

, a

j+1

), . . . , (t

j,m

, a

)i, else it is equal to

h(t

j,k

, a

), . . . , (t

j, j

, a

\ X

), (t

j, j+1

, a

j+1

). . . , (t

j,m

, a

)i1 ≤

k ≤ m as t

≤ ws and t

j(k−1)

≥ ws.

If j = 1 the projection of SDB on α = h(0, I)i is

deﬁned by:

SDB|α = {ss = su f f ix(γ, I, ws)}, γ ∈ SDB

otherwise

SDB|α = {ss = wsu f f ix(γ, I, ws)}, γ ∈ SDB

The database projection on an item is described in Al-

gorithm 1. On each projection, frequent pairs are cal-

Algorithm 1: Projection.

Input: SDB, (δt, I)

foreach sequence s of SDB do

foreach itemset IS of s do

if I ∈ I

and t

∈ δt then

if IS r {I} =

0 then

add to projection

S = h((t

− ws), I

)...((t

k+1

−

), I

k+1

), ...((t

−t

), I

)i;

else

add to projection S =

h(t

−ws), I

)...(0, I

r{i}), ((t

k+1

−

tk), I

k+1

), ...((t

−t

), I

)i;

culated by using Find Frequent Pairs function. A

pair (δt, I) is a combination of a temporal interval and

an item. The interval is the period in which the item

occurs in the projection. The depth of this interval is

at most equals to the window size. Once a frequent

pair (δt, i) is identiﬁed, it is concatenated to the last

generated pattern. If the resulting pattern satisﬁes the

time constraints, then it is a new frequent sequence.

This new pattern generates a new iteration: the pro-

jection (SDB|α)|(δt, i) is computed and becomes the

new research space of frequent pairs.

Example 6. Let us consider example 5. The

projections of s

and s

w.r.t. h(0, A)i provide

resp. (SDB|

(0,A)

): s

: h(1, C)(2, B)(5, CDE)(6, F)i,

: h(0, B)(5, DF)i where the frequent pairs are:

([0, 2], B), ([5, 5], D), and ([5, 6], F).

For ([0, 2], B) iteration, the projection SDB|

([0,2],AB)

is: s

: h(−1, C)(3, CDE)(4, F)i, s

: h(5, DF)i where

frequent pairs are: ([3, 5], D), ([4, 5], F).

• For ([3, 5], D) iteration, the concatenation of

([0, 2], AB) and ([3, 5], D) is deﬁned by:

– Items A and B are considered simultaneously

and take place in [0, 2].

– Item D is directly successive to AB and takes

place earlier than three time units after AB, so

D holds earlier than the time 5 = (2+3). More-

over, D occurs within 5 time units after AB, so

D will be held no later than the time 5 = (0+5).

Thus, the projection on ([3, 5], D) provides

SDB|

([0,2],AB)([5,5],D)

= s

: h(0, CE)(1, F)i, s

h(0, F)i.

The frequent pair ([0, 1], F). It provides the frequent

sequence: h([0, 2], AB)([5, 5], D)([5, 6], F)i and the

following projection: S

: h(−1, CE)i. There is no fre-

quent pair, so no new iteration is executed. The same

process is executed for the pairs ([4, 5], F)), ([5, 5], D)

and ([5, 6], F).

Conclusion. The approach we have presented pro-

vides frequent temporal patterns. Their timestamps

are in the form of intervals whose widths are ad-

justable by the user. These intervals allow a time oc-

currence approximation of events. As GSP (Srikant

and Agrawal, 1996), our approach uses also as input:

a set of sequences, support threshold and time con-

straints. The two main differences between both ap-

proaches are: (1) the extraction process used. The ef-

fectiveness of PreﬁxSpan over GSP was demonstrated

in various works (Fournier-Viger et al., 2008) (Hi-

rate and Yamana, 2006) (Pei et al., 2004). (2) the

quality of data. GSP patterns are timeless. In some

areas, lack of timestamps represents a major handi-

cap to data understanding and interpretation. In addi-

tion, the number of patterns returned by our approach

For ([0, 2], B) iteration, the projection SDB|

([0,2],AB)

is: s

: h(−1, C)(3, CDE)(4, F)i, s

: h(5, DF)i where

frequent pairs are: ([3, 5], D), ([4, 5], F).

• For ([3, 5], D) iteration, the concatenation of

([0, 2], AB) and ([3, 5], D) is deﬁned by:

– Items A and B are considered simultaneously

and take place in [0, 2].

– Item D is directly successive to AB and takes

place earlier than three time units after AB, so

D holds earlier than the time 5 = (2+3). More-

over, D occurs within 5 time units after AB, so

D will be held no later than the time 5 = (0+5).

Thus, the projection on ([3, 5], D) provides

SDB|

([0,2],AB)([5,5],D)

= s

: h(0, CE)(1, F)i, s

h(0, F)i.

The frequent pair ([0, 1], F). It provides the frequent

sequence: h([0, 2], AB)([5, 5], D)([5, 6], F)i and the

following projection: S

: h(−1, CE)i. There is no fre-

quent pair, so no new iteration is executed. The same

process is executed for the pairs ([4, 5], F)), ([5, 5], D)

and ([5, 6], F).

Conclusion. The approach we have presented pro-

vides frequent temporal patterns. Their timestamps

are in the form of intervals whose widths are ad-

justable by the user. These intervals allow a time oc-

currence approximation of events. As GSP (Srikant

and Agrawal, 1996), our approach uses also as input:

a set of sequences, support threshold and time con-

straints. The two main differences between both ap-

proaches are: (1) the extraction process used. The ef-

fectiveness of PreﬁxSpan over GSP was demonstrated

in various works (Fournier-Viger et al., 2008) (Hi-

rate and Yamana, 2006) (Pei et al., 2004). (2) the

quality of data. GSP patterns are timeless. In some

areas, lack of timestamps represents a major handi-

cap to data understanding and interpretation. In addi-

tion, the number of patterns returned by our approach

KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval

284

is more important. Indeed, the application of back-

ward window allows to expand continuation patterns

with those containing unordered events on a window

size interval. Then, where GSP considers no fre-

quent patterns, our approach searches through back-

ward window redundant information and extracts a

frequent pattern. The approach presented in (Hirate

and Yamana, 2006) has as input a sequences database,

a value of minsupp, time constraints and a time func-

tion to align timestamps. This approach and ours use

the same process of extracting patterns based on Pre-

ﬁxSpan algorithm. The main difference concerns the

amount of data. While the use of the sliding window

can group events by degrees relative to the size of the

window, the function level has only the events whose

timestamps are in the same level. So, we end up with

more frequent patterns due to the sliding form of the

window, which groups gradually close events. Con-

crete results are presented in the next section.

is more important. Indeed, the application of back-

ward window allows to expand continuation patterns

with those containing unordered events on a window

size interval. Then, where GSP considers no fre-

quent patterns, our approach searches through back-

ward window redundant information and extracts a

frequent pattern. The approach presented in (Hirate

and Yamana, 2006) has as input a sequences database,

a value of minsupp, time constraints and a time func-

tion to align timestamps. This approach and ours use

the same process of extracting patterns based on Pre-

ﬁxSpan algorithm. The main difference concerns the

amount of data. While the use of the sliding window

can group events by degrees relative to the size of the

window, the function level has only the events whose

timestamps are in the same level. So, we end up with

more frequent patterns due to the sliding form of the

window, which groups gradually close events. Con-

crete results are presented in the next section.

Algorithm 2: Principal.

Input: SDB, minsupp, mingap, max gap,

min whole interval, max whole interval, ws,

Patterns

Patterns = null;

Find frequent items in SDB ;

foreach frequent item I do

preﬁx = (0, I);

SDB |

(0,I)

= Projection(SDB, (0,I), 0) ;

foreach pair (δt

, I

) in Find Frequent Pairs

(SDB|

(0,I)

, C1, C2) do

newpreﬁx = concat(preﬁx, (δt

, I

));

if newpreﬁx satisﬁes min whole interval and

max whole interval then

SDB|

(0,I)|(δt

)

= Projection(SDB|

(0,I)

, I

), ws) ;

Projection*(SDB|

(0,I)|(δt

)

, minsupp,;

mingap, maxgap, min whole interval,;

max whole interval,

newF pre f ix, FSeq) ;

if newpreﬁx 6∈ Patterns then

Add newpreﬁx to Patterns ;

5 EXPERIMENTS

In this section, we present a qualitative experimenta-

tion of our approach. In a ﬁrst part, the data used for

our experimentation are described. Then, we detail a

performance evaluation of the process used by the ap-

proaches, to motivate the method that we use to im-

plement our work. In a third part, we compare our im-

plementation to a GSPM implementation of patterns

growth process.

Algorithm 3: Projection*

Input: SDB, minsupp, mingap, maxgap,

min whole interval,

max whole interval, pre f ix, Patterns)

foreach Pair (f(t), t) in Find Frequent Pairs

(SDB, mingap, maxgap) do

newpreﬁx = concat(preﬁx, (f(t),I));

if newpreﬁx satisﬁes min whole interval and

max whole interval then

if support( f (t), I) ≥ minsupp then

Projection*(SDB|

( f (t)

)

, mingap,

maxgap,;

min whole interval,

max whole interval,;

newpreﬁx, patterns );

Add newpreﬁx to Patterns;

Data Description. We applied our algorithms to real

aeronautical data related to a life history of six same

aircraft. These data represent missions, reports car-

ried out on different part of the vehicles and equip-

ments maintenance tasks execution. It is organized

on temporal sequences. A sequence is built by ac-

cumulating successive occurred events on an aircraft

between occurrence of a speciﬁc maintenance task.

Preprocessed sequences, from all vehicles and ended

with the application of a same maintenance task, rep-

resent lists of temporal events preceding the execu-

tion of the task. Extracting patterns from this database

consists in identifying commonly usages that lead to

the application of this maintenance task. It allows to

distinguish maintenance operations that use common

root causes. Table 1 represents a sequences history

sample for the task op m1. We used a GSP imple-

Table 1: Sample of preprocessed sequences.

ID Sequences

S 1 h(t=0, taxi, sale),(t=223, PARAPUB-

LIC, sandy ), (t=300, EMS, normal),

(t=330, report 1),(t=490, PARAPUB-

LIC, normal),(t=520, op m1)i

S 2: h(t=0, PARAPUBLIC, sandy), (t=190,

taxi,normal), (t=324, OEM, salt),

(t=500, op m1 ) i

S 3: h(t=0, EMS, normal), (t=190,taxi,salt),

(t=340, PARAPUBLIC, normal)(t=390,

report 1),(t=400 , op m1 )i

mentation available in WEKA

without any time con-

straints implementation. We also modiﬁed an imple-

mentation of (Fournier-Viger et al., 2008)

to obtain

the GSPM implementation. We modiﬁed the same

http://www.cs.waikato.ac.nz/ ml/weka/

http://www.philippe-fournier-viger.com/spmf

5 EXPERIMENTS

In this section, we present a qualitative experimenta-

tion of our approach. In a ﬁrst part, the data used for

our experimentation are described. Then, we detail a

performance evaluation of the process used by the ap-

proaches, to motivate the method that we use to im-

plement our work. In a third part, we compare our im-

plementation to a GSPM implementation of patterns

growth process.

is more important. Indeed, the application of back-

ward window allows to expand continuation patterns

with those containing unordered events on a window

size interval. Then, where GSP considers no fre-

quent patterns, our approach searches through back-

ward window redundant information and extracts a

frequent pattern. The approach presented in (Hirate

and Yamana, 2006) has as input a sequences database,

a value of minsupp, time constraints and a time func-

tion to align timestamps. This approach and ours use

the same process of extracting patterns based on Pre-

ﬁxSpan algorithm. The main difference concerns the

amount of data. While the use of the sliding window

can group events by degrees relative to the size of the

window, the function level has only the events whose

timestamps are in the same level. So, we end up with

more frequent patterns due to the sliding form of the

window, which groups gradually close events. Con-

crete results are presented in the next section.

Algorithm 2: Principal.

Input: SDB, minsupp, mingap, max gap,

min whole interval, max whole interval, ws,

Patterns

Patterns = null;

Find frequent items in SDB ;

foreach frequent item I do

preﬁx = (0, I);

SDB |

(0,I)

= Projection(SDB, (0,I), 0) ;

foreach pair (δt

, I

) in Find Frequent Pairs

(SDB|

(0,I)

, C1, C2) do

newpreﬁx = concat(preﬁx, (δt

, I

));

if newpreﬁx satisﬁes min whole interval and

max whole interval then

SDB|

(0,I)|(δt

)

= Projection(SDB|

(0,I)

, I

), ws) ;

Projection*(SDB|

(0,I)|(δt

)

, minsupp,;

mingap, maxgap, min whole interval,;

max whole interval,

newF pre f ix, FSeq) ;

if newpreﬁx 6∈ Patterns then

Add newpreﬁx to Patterns ;

5 EXPERIMENTS

In this section, we present a qualitative experimenta-

tion of our approach. In a ﬁrst part, the data used for

our experimentation are described. Then, we detail a

performance evaluation of the process used by the ap-

proaches, to motivate the method that we use to im-

plement our work. In a third part, we compare our im-

plementation to a GSPM implementation of patterns

growth process.

Algorithm 3: Projection*

Input: SDB, minsupp, mingap, maxgap,

min whole interval,

max whole interval, pre f ix, Patterns)

foreach Pair (f(t), t) in Find Frequent Pairs

(SDB, mingap, maxgap) do

newpreﬁx = concat(preﬁx, (f(t),I));

if newpreﬁx satisﬁes min whole interval and

max whole interval then

if support( f (t), I) ≥ minsupp then

Projection*(SDB|

( f (t)

)

, mingap,

maxgap,;

min whole interval,

max whole interval,;

newpreﬁx, patterns );

Add newpreﬁx to Patterns;

Data Description. We applied our algorithms to real

aeronautical data related to a life history of six same

aircraft. These data represent missions, reports car-

ried out on different part of the vehicles and equip-

ments maintenance tasks execution. It is organized

on temporal sequences. A sequence is built by ac-

cumulating successive occurred events on an aircraft

between occurrence of a speciﬁc maintenance task.

Preprocessed sequences, from all vehicles and ended

with the application of a same maintenance task, rep-

resent lists of temporal events preceding the execu-

tion of the task. Extracting patterns from this database

consists in identifying commonly usages that lead to

the application of this maintenance task. It allows to

distinguish maintenance operations that use common

root causes. Table 1 represents a sequences history

sample for the task op m1. We used a GSP imple-

Table 1: Sample of preprocessed sequences.

ID Sequences

S 1 h(t=0, taxi, sale),(t=223, PARAPUB-

LIC, sandy ), (t=300, EMS, normal),

(t=330, report 1),(t=490, PARAPUB-

LIC, normal),(t=520, op m1)i

S 2: h(t=0, PARAPUBLIC, sandy), (t=190,

taxi,normal), (t=324, OEM, salt),

(t=500, op m1 ) i

S 3: h(t=0, EMS, normal), (t=190,taxi,salt),

(t=340, PARAPUBLIC, normal)(t=390,

report 1),(t=400 , op m1 )i

mentation available in WEKA

without any time con-

straints implementation. We also modiﬁed an imple-

mentation of (Fournier-Viger et al., 2008)

to obtain

the GSPM implementation. We modiﬁed the same

http://www.cs.waikato.ac.nz/ ml/weka/

http://www.philippe-fournier-viger.com/spmf

Data Description. We applied our algorithms to real

aeronautical data related to a life history of six same

aircraft. These data represent missions, reports car-

ried out on different part of the vehicles and equip-

ments maintenance tasks execution. It is organized

on temporal sequences. A sequence is built by ac-

cumulating successive occurred events on an aircraft

between occurrence of a speciﬁc maintenance task.

Preprocessed sequences, from all vehicles and ended

with the application of a same maintenance task, rep-

resent lists of temporal events preceding the execu-

tion of the task. Extracting patterns from this database

consists in identifying commonly usages that lead to

the application of this maintenance task. It allows to

distinguish maintenance operations that use common

root causes. Table 1 represents a sequences history

sample for the task op m1. We used a GSP imple-

Table 1: Sample of preprocessed sequences.

ID Sequences

S 1 h(t=0, taxi, sale),(t=223, PARAPUB-

LIC, sandy ), (t=300, EMS, normal),

(t=330, report 1),(t=490, PARAPUB-

LIC, normal),(t=520, op m1)i

S 2: h(t=0, PARAPUBLIC, sandy), (t=190,

taxi,normal), (t=324, OEM, salt),

(t=500, op m1 ) i

S 3: h(t=0, EMS, normal), (t=190,taxi,salt),

(t=340, PARAPUBLIC, normal)(t=390,

report 1),(t=400 , op m1 )i

mentation available in WEKA

without any time con-

straints implementation. We also modiﬁed an imple-

mentation of (Fournier-Viger et al., 2008)

to obtain

the GSPM implementation. We modiﬁed the same

http://www.cs.waikato.ac.nz/ ml/weka/

http://www.philippe-fournier-viger.com/spmf

TIME CONSTRAINTS EXTENSION ON FREQUENT SEQUENTIAL PATTERNS

285

Figure 1: Process evaluation.

Figure 2: Performance evaluation.

code to implement our approach. First, we evaluate

the quality and results provided by our approach com-

pared to those provided by GSP. In a second step, we

evaluate the of performances cost of interesting se-

quences approach compared to GSPM, since the two

techniques are based on the same basic algorithm Pre-

ﬁxSpan. We will then assess the quality of the results

of these two approaches.

Process Evaluation. We execute the three ap-

proaches by discarding time constraints to evaluate

the performance of allApriori method compared to

pattern-growth method. Figure 1 shows that execu-

tion time of pattern-growth (GSPM) is less than All

Priori( GSPM). These results reinforce our choice to

choose the Preﬁx Span approach.

Algorithm Evaluation. In this evaluation, we com-

pare execution time and the number of extracted se-

quences with varying minsupp. We compare SPMF

with backward window size, our proposed algorithm,

and generalized Sequential Patterns Mining with item

Interval. We have tested 3 situations:

situation 1 with f(t) = t/2, min gap= 3, max

gap = 5 and ws=2, situation 2 with f(t) = t,

min whole interval = 3, max whole interval = 7 and

ws = 0, situation 3 with f(t)=E(t/2), min gap= 1, max

gap = 5, min whole interval = 3, min whole interval

= 7 and ws = 2.

As shown in Figure 2(a), using our backward sliding

window allows to have a large number of patterns.

The number of extracted sequences increases expo-

nentially as minsupport decreases. Our approach is

Table 2: Result patterns.

GSPM results support

(fn(t)=0, taxi)(fn(t)=3, op m3 ) 0,5

Interesting sequences results

(t in [ 0.0, 0.0], taxi)(t in [ 2.0, 4.0], op m3) 1

(t in [ 0.0, 0.0], taxi)(t in [ 5.0, 7.0], op m3) 0,7

(t in [ 0.0, 0.0], taxi)(t in [ 4.0, 6.0], op m3) 0,8

interesting in high values of minsupport because it

provides patterns that are not extracted with GSPM.

For the lowest support values our approach execution

time is higher than that of GSPM (shown in Figure

2(b)). It is due to the greater number of possible con-

tinuations provided by the backward window size.

Patterns Quality Evaluation. Table 2 shows the

resulting patterns provided by GSPM in the ﬁrst col-

umn and by our algorithm in the second one. We can

see that when GSPM provides a unique pattern our

approach shows 3 because of the sliding windows.

It allows the user to see all frequent possible com-

binations of patterns regarding to the user parame-

ters (windows size). So, our interesting sequences

approach has more exhaustive representation of the

data.

6 CONCLUSIONS

In this paper, we presented a new deﬁnition of in-

teresting sequences based on the principle of sliding

windows which takes into account any order within

transactions. This deﬁnition is important for sequence

data that do not require high timing precision. It al-

lows to gather as much information as possible to rep-

resent the actual data in a richer way without loss of

information. The deﬁnition presented here is inte-

grated into the process of the algorithm (Fournier-

Viger et al., 2008) and provides satisfying results

quality. Future work will focus on improving perfor-

mance. Another issue is the huge number of extracted

sequences. Extracting maximal interesting sequences

may be a solution to reduce the result size without

information loss. This approach is currently applied

on aeronautic vehicles life history to identify common

sequences preceding maintenance operations. These

same behaviors will be used for better maintenance

management and vehicle stops forecasting.

REFERENCES

Agrawal, R. and Srikant, R. (1994). Fast algorithms for

mining association rules in large databases. In VLDB

KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval

286

Proceedings.

Fournier-Viger, P., Nkambou, R., and Nguifo, E. M. (2008).

A knowledge discovery framework for learning task

models from user interactions in intelligent tutoring

systems. In Proceedings of the 7th MICAI Conf.).

Hirate, Y. and Yamana, H. (2006). Generalized sequential

pattern mining with item intervals. Journal of Com-

puters, 1(3).

Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H.,

Chen, Q., Dayal, U., and Hsu, M. (2004). Mining

sequential patterns by pattern-growth: The preﬁxspan

approach. IEEE Trans. Knowl. Data Eng.

Srikant, R. and Agrawal, R. (1996). Mining sequential

patterns: Generalizations and performance improve-

ments. In Proc. of EDBT.

TIME CONSTRAINTS EXTENSION ON FREQUENT SEQUENTIAL PATTERNS

287