![](bg2.png)
applications. The sequential patterns at lower
concept levels often carry more specific and
concrete information and those at higher concept
levels carry more general information. This requires
progressively deepening the mining process to
multiple concept levels. In many cases, concept
hierarchies over items are available. Given a set of
transactions and a concept hierarchy over items
contained in the transactions, association patterns at
any level of the hierarchy can be found by
developing appropriate algorithms (Chen et al.,
2001).
Quantitative association rules over a set of
purchased items in a customer transaction database
were defined over quantitative and categorical
attributes of the items (Srikant and Agrawal, 1996).
The values of categorical attributes were mapped to
a set of contiguous integers. While the domain of
quantitative attributes was discretized into intervals
by fine-partitioning the values of the attributes and
combining the adjacent partitions as necessary and
the intervals were then mapped to contiguous
integers. As a result, each attribute had a form of
<attribute, value> where value was the mapped
integer of an interval for quantitative attributes or a
single value for categorical attributes. Then the
algorithms for finding Boolean association rules can
be used on the transformed database to discover
quantitative association rules. Some algorithms have
been proposed (Agrawal and Srikant, 1995, Chen et
al. 2001, Agrawal and Srikant, 1994). But, few of
them focused on quantitative sequential patterns that
involves discretising the domain of quantitative
attributes into intervals while these intervals may
not be concise and meaningful enough for human
experts to easily obtain nontrivial knowledge from
those rules discovered.
In this study, we present an algorithm for
mining sequential patterns at multiple levels with
quantitative attributes. Instead of using the partition
method discussed above, the fuzzy concept was
introduced into the algorithm. Fuzzy sets were
proposed by Zadeh (1965). Since then much
progress in theory and application of fuzzy sets has
been observed (Chen et al., 2001). The fuzzy
concept is considered better than the partition
method as fuzzy sets provide a smooth transition
between member and non-member of a set. The use
of fuzzy techniques makes the algorithms resilient
to noise and missing values in the databases. Fuzzy
concepts are not confined to a single attribute.
Instead, they can be defined on a set of attributes.
The proposed method for mining fuzzy
multiple-level sequential patterns uses a
hierarchically encoded customer-sequence table,
instead of the original customer transaction table.
The problem of mining multiple-level sequential
patterns with quantitative attributes can be split into
four steps:
(1) Transforming the original database into a
hierarchically encoded customer-sequences table;
(2) Fuzzy partitioning in each quantitative
attribute on each concept level;
(3) Finding all fuzzy large sequences at every
concept level using a top-down, progressively
deepening mining process;
(4) Generating all fuzzy sequential patterns and
sequential rules from the result of step 3.
Step 3 is the most crucial step for the method. As
long as all the fuzzy large sequences at each concept
level can be discovered, it is not difficult to derive
the corresponding sequential patterns and
association rules.
The paper is organized as follows. Section 2
introduces some related concepts of multiple-level
sequential patterns and fuzzy partitions of the
quantitative attributes. Based on these concepts the
problem of mining fuzzy multiple-level sequential
patterns can be formally characterized. Section 3
describes the method for mining fuzzy multiple-
level sequential patterns in detail. An algorithm for
discovering large sequences at each concept level is
presented and discussed. Section 4 concludes this
study.
2 PROBLEM STATEMENT
In a given customer transactions database D, each
transaction consists of the following fields:
customer-id, transaction-time, and the items
purchased in the transaction. No customer has more
than one transaction at the same transaction-time.
Each item is a binary variable representing whether
an item was bought or not. Let I = {i
1
, i
2
, …, i
n
} be a
set of literals. An itemset is a non-empty set of
items. A sequence is a non-empty and ordered list of
itemsets. We denote an itemset by (i
1
, i
2
, …, i
m
),
where i
j
is an item. The length of an itemset is the
number of items in it. An itemset of length k is
called a k-itemset. We denote a sequence S by <s
1
,
s
2
, …, s
n
>, where s
j
is an itemset. The length of a
sequence is the number of itemsets in it. A sequence
of length k is called a k-sequence. The sequence
formed by the concatenation of two sequences A and
B is denoted as <A, B>. The following concept
definitions are based on (Agrawal and Srikant,
1995, Chen et al. 2001).
Definition 1: All the transactions of a customer
can together be viewed as a sequence, where each
transaction corresponds to a set of items, and the list
of transactions, ordered by increasing transaction-
time, corresponds to a sequence. A transaction made
FUZZY MULTIPLE-LEVEL SEQUENTIAL PATTERNS DISCOVERY FROM CUSTOMER TRANSACTION
DATABASES
435