
 
2  SEQUENTIAL PATTERN 
MINING PROBLEM  
In this section, we first define the problem of 
sequential pattern mining, and then illustrate some 
of the well known Sequential Pattern Mining 
Algorithms explaining their working with examples. 
2.1 Definitions 
The formal statement of sequential pattern mining is 
defined in (Agrawal,1995) as following: 
Let I = {x
1
,……,x
m
} be a set of items. An  itemset is 
a non-empty subset of items, and an itemset with l 
items called a l-itemset. 
A  sequence  s = 
〈
 s
1
,…,s
n
 
〉
  is an ordered list of 
itemsets where s
i
 is the i
th
 element of s and is called 
a transaction. The number of transaction in a 
sequence is called the length of the sequence. A 
sequnce s with length k is called k-sequence and is 
denoted by |s|. 
Consider two data sequences s = 〈 s
1
,…,s
n
 〉 and t = 〈 
t
1
,…,t
m
 〉. We say that s is a subsequence of t if s is a 
“projection” of t derived by deleting elements and/or 
items from t. More formally s is a subsequence of t 
if there exist integers j
1
 < j
2
 < j
3
 <…<j
n
 such that s
1
 
⊆ t
j1
, s
2
 ⊆ t
j2
,…,and s
n
 ⊆ t
jn
. For example sequences 〈 
1 3 〉 and 〈 1 2 4 〉 are subsequences of 〈 1 2 3 4 〉, 
while 〈 3 1 〉 is not.  
Following (Srikant, 1996), the sequence s is defined 
to  be subsequence with a maximum distance 
constraint of 
δ
, or alternately 
δ
-distance 
subsequence, of t if there exist integers j
1
 < j
2
 < j
3
 
<…<j
n
 such that s
1
 ⊆ t
j1
, s
2
 ⊆ t
j2
, s
n
 ⊆ t
jn
 and j
k
 – j
k-1
 
≤ 
δ
 for each k = 2,3,4,…,n. That is, occurrences of 
adjacent elements of s within t are not separated by 
more than 
δ
 elements.  
As a special case of the above definition, we say that 
s is a contiguous subsequence of t if s is a 1-
distance subsequence of t, i.e., the elements of s can 
be mapped to a contiguous segment of t. 
A sequence s is said to contain a sequence p if p is a 
subsequence of s.  
The support of a pattern p is defined as the fraction 
of sequences in the input database that contain p.  
Given a set of sequences S, we say that s  ∈  S is 
maximal if there are no sequences in S  - { s } that 
contain it. 
2.2 Sequential Pattern Mining 
Algorithms 
Sequential pattern mining has been intensively 
studied during recent years, so there exist a great 
diversity of algorithms for sequential pattern mining. 
Most of these algorithms are based on the Apriori 
property proposed in association rule mining 
(Agrwal, 1994), which states that any sub-pattern of 
a frequent pattern must be frequent. Based on this 
heuristic, a series of Apriori-like algorithms have 
been proposed: AprioriAll, AprioriSome, 
DynamicSome in (Agrawal,1995), and GSP 
(Srikant, 1996). Later on another series of data 
projection based algorithms became popular because 
of their efficiency, which include FreeSpan (Han, 
2000) and PrefixSpan (Pei, 2001). Recently, Zaki 
proposed an efficient algorithm called SPADE 
(Zaki, 2001), which is a lattice based algorithm. 
After that, a fast algorithm, called SPAM (Ayres, 
2002) is proposed, it uses a vertical bitmap 
representation of the data. Also, a memory indexing 
based approach called MEMISP (Ming-Yen, 2002) 
is proposed, it uses a memory indexing scheme to 
reduce the I/O complexity. 
3  SEQUENTIAL PATTERN 
MINING AND CONSTRAINTS 
Like many frequent mining problems, there are two 
major difficulties in sequential pattern mining: (1) 
effictiveness: mining may return a huge number of 
patterns, many of which could be uninteresting to 
users, and (2) efficiency: it often takes substantial 
processng power for mining the complete set of 
sequential paterns in a large sequence database. 
Constraint-based mining may overcome both 
difficulties since constraints usually represents user′s 
interest and focus, which confines the patterns to be 
found to a particular set o conditions. Moreover, if 
constraints can be pushed deep into the mining 
process, it is likely to achieve efficiency since the 
search can be focused. This motivates the study of 
constraint-based mining of sequential patterns. 
3.1 Categories of Constraints 
For real-world data mining, it is interesting to 
examine some interesting constraints from the 
application point of view. These constraints are 
presented in (Pei, 2002). Although this is by no 
means complete, it covers most of the interesting 
constraints in applications
.  
Alternatively, constraints can be categorized 
according to their properties for constraint pushing 
in the candidate generation and pruning 
processes(Ng, 1998, Pei, 2000, Pei, 2001). 
Monotonicity,  anti-monotonicity, and succinctness 
are three categories of constraints that  we briefly 
discuss below. 
MINING SEQUENTIAL PATTERNS WITH REGULAR EXPRESSION CONSTRAINTS USING SEQUENTIAL
PATTERN TREE
117