2 and MIS(i
1
) ≥ MIS(i
2
) ≥ ... ≥ MIS(i
k
), is frequent,
then all of its subsets involving item having lowest
MIS value (i
1
) need to be frequent and the subsets in-
volving rest of the items i
j
, where j ≥ 2, need not nec-
essarly be frequent patterns. So the multiple minsup
based frequent pattern mining algorithms have to con-
sider both frequent and infrequent items (complete set
of items I) to generate further frequent patterns.
For example, consider a transaction dataset in-
volving three items i
1
, i
2
and i
3
, each having MIS
values 5%, 10% and 20% respectively. If a sorted 3-
pattern {i
1
, i
2
, i
3
} has support 6% then it is a frequent
pattern. In this frequent pattern, the supersets of item
i
1
i.e., {{i
1
}, {i
1
, i
2
}, {i
1
, i
3
}} are to be frequent.
However, the supersets of items i
2
and i
3
, say {i
2
, i
3
}
may still be infrequent by having support as 8%. For
this pattern to be frequent, the support it should have
is 10% (min(10%, 20%)).
The CFP-growth approach extends “pattern-
growth” methodology to multiple minimum support
values.
In this approach, it is assumed that the information
regarding the MIS values for the items will be the pro-
vided by the user prior to its execution. The MIS-tree
is constructed as follows. First, the items are sorted in
descending order of their MIS values, say L
1
and their
frequency values are set at zero. Next, a root node of
the tree is constructed by labeling with ”null”. Next,
for each transaction in the dataset the following steps
are performed to generate MIS-tree. They are:
1. The items in each transaction are sorted in L
1
or-
der. Next, update the frequencies of the items
which are present in the transaction by increment-
ing the frequency value of the respective item by
1.
2. A branch is created for each transaction such that
nodes represent the items, level of nodes in a
branch is based on the sorted order and the count
of each node is set to 1. However, in construct-
ing the new branch for a transaction, the count of
each node along a common prefix is incremented
by 1, and nodes for the items following the pre-
fix are created, linked accordingly and their values
are set to 1.
To facilitate tree traversal, an item header table
is built so that each item points to its occurrences
in the tree via a chain of node-links. From the item
frequencies, the respective support values are calcu-
lated. Using the lowest MIS value among all the items
(MIS), the tree-pruning process is performed on the
item header and MIS-tree to remove the items hav-
ing support less than the lowest MIS value among
all items. After tree-pruning, tree-merging process is
performed to generate the compact MIS-tree.
Table 1: Transaction dataset.
TID Items
1 bread, jam
2 bread, jam, ball
3 bread, jam, pen
4 bread, jam, pencils
5 bread, bat, ball
6 bed, pillow
7 bed, pillow
8 ball, bat
9 ball, bat
10 ball, bat
The compact MIS-tree is mined as follows. Each
item in L
1
is considered as a suffix pattern, next its
conditional pattern base, which is a set of prefix-
paths in the MIS-tree is constructed and mining is per-
formed recursively on such a tree. The pattern growth
is achieved by the concatenation of the suffix pattern
with the frequent patterns generated from the condi-
tional pattern.
For the dataset shown in Table 1, the extraction
of frequent patterns using CFP-growth algorithm is
illustrated using Example 1. For ease of explaining
this example we refer the support and MIS values of
the items in terms of support counts and MIS counts.
Example 1. For the transaction dataset shown in
Table 1, the itemset I = {bread, ball, jam, bat, pil-
low, bed, pencil, pen}. Let the MIS values (in
count) for bread, ball, jam, bat, pillow, bed, pen-
cil and pen be 4, 4, 3, 3, 2, 2, 2 and 2 respec-
tively. Now, using the MIS values for the items,
the CFP-growth approach sorts the items in de-
scending order of their MIS values and assigns the
frequency value of zero to every item. Thus, L
1
contain {{bread:0}, {ball:0}, {jam:0}, {bat:0},
{pillow:0}, {bed:0}, {pencil:0}, {pen:0}}. In the
first scan of the dataset shown in Table 1, the first
transaction “1: bread, jam” containing two items
is scanned in L
1
order i.e., {bread, jam} and the
frequencies of items “bread” and “jam” are up-
dated by 1 in L
1
. Next, a first branch of tree is con-
structed with two nodes, hbread: 1i and hjam: 1i,
where “bread” is linked as a child of the root and
“jam” is linked as a child of “bread”. The second
transaction “2: bread, jam, ball” containing three
items “bread, ball, jam” in L
1
order and the fre-
quencies of the items are updated by incrementing
by 1. Next, the items in second transaction, or-
dered in L
1
, will result in a branch where “bread”
is linked to root, “ball” is linked to “bread” and
“jam” is linked to “ball”. However, this branch
shares the common prefix, “bread”, with the ex-
isting path for first transaction. Therefore, the
AN IMPROVED FREQUENT PATTERN-GROWTH APPROACH TO DISCOVER RARE ASSOCIATION RULES
45