dimension array consisting of 4 itemsets in each
row. If n=3, each row will have 8 itemsets. The
itemsets are placed from left to right and from top to
bottom. The first itemset in each row and column is
called the leading itemset. Otherwise, it is called a
non-leading itemset. The only exception is the upper
leftmost entry 0(0) which is treated as null. For
example, A(1) is the leading itemset of the second
column and D(8) is the leading itemset of the third
row. The number in the parentheses is the
corresponding decimal number of the itemset. The
value of n can be defined by the user. Table 3 shows
the result of transforming the representation from
one-dimension to two-dimension.
Table 3: A two-dimension list of itemsets.
0(0) A(1) B(2) AB(3)
C(4) AC(5) BC(6) ABC(7)
D(8) AD(9) BD(10) ABD(11)
CD(12) ACD(13) BCD(14) ABCD(15)
E(16) AE(17) BE(18) ABE(19)
CE(20) ACE(21) BCE(22) ABCE(23)
DE(24) ADE(25) BDE(26) ABDE(27)
CDE(28) ACDE(29) BCDE(30) ABCDE(31)
In Table 3 we can observe that each non-leading
itemset is composed of leading itemsets from its
corresponding row and column. Take itemset BCE
for example, it is a combination of the leading
column itemset B and the leading row itemset CE.
The other non-leading itemsets can be verified in the
same way easily. This is a case when the length of
the row or column is 2
n
.
The reason for the case of the length 2
n
is due to
the use of bit map and the binary system. We can
find that after we present the itemsets with bit map
and the row has the length of 2
n
, the first row
contains a null and the first n items along with their
combinations. And the first column contains a null
and the remaining items along with their
combinations. For n = 2, we can see that the first
row contains the leading itemsets of the first n items
A and B. And the first column contains the leading
itemsets of the rest of items C, D, and E.
Another interesting characteristic of Table 3 is
that the items in the first column can be added
incrementally along with the composed itemsets to
become the leading itemsets. While adding a new
item, it will append its combinations with the
previous results at the end of the table.
For example, the upper half of Table 3 is the list
of itemsets for items {A, B, C, D} in a two-
dimension representation. The corresponding
decimal numbers are from 0 to 15. When we add a
new item E, it will combine with the existing
itemsets of {A, B, C, D} and form the lower part of
Table 3. The corresponding decimal numbers are
from 16 to 31. The newly formed table with the
decimal numbers from 0 to 31 is exactly the same as
Table 3 for itemsets {A, B, C, D, E}. This means
that we can deal with item updates in our approach.
To simply our process, each itemset will be
represented by its corresponding column and row.
We denote their decimal values as X and Y
respectively. Take ABCE (23) for instance, it can be
taken apart as ABCE (23) = AB (3) + CE (20). Here
X = 3 and Y = 20. This indicates that we can
decompose ABCE and obtain a unique
representation of X-column and Y-row.
Our item-transformation method uses the above
concept. In the first step, we define the value of n.
For a two-dimension representation, the length of
the row is 2
n
. Therefore, the first n items and their
combined itemsets will be placed in the first row
where the remaining items and their combined
itemsets will be placed in the first column in an
ascending order. After we transform the itemsets by
using bit map, their decimal values can be calculated
easily. The first position means 2
0
, the second
position means 2
1
, and the n-th position means 2
n-1
.
To further simply the representation of an itemset in
the two-dimension table with the index of column
and row, we can separate the itemset into two
independent parts (X.Y) where X and Y start from
the origin. Take itemset ABCE as an example, with
n = 2, we can separate the itemset into two parts
which are AB and CE. For sub-itemset AB, we can
get the bit map of 11 and X=1×2
0
+1×2
1
=1+2=3. For
sub-itemset CE, we can get the bit map of 101 and
Y=1×2
0
+0×2
1
+1×2
2
=1+0+4=5. Therefore the
itemset ABCE can be transformed into another form
of (X.Y) = (3.5). To verify with Table 3, the itemset
ABCE is composed of the third column and the fifth
row. This indicates that each itemset can be
represented by a unique identifier.
3.3 Support Counting after
Item-Transformation
The next step is to process the transformed sub-
itemsets in the form of (X.Y). We add a third
variable of alphabet “Z” to represent the value of
support counting. The expression of an itemset
becomes (X.Y.Z). The value of Z is initialized to
zero. For better storage management, we sort the
itemsets according to their support count Z first and
then X.Y in ascending order. For updates of adding
additional item, we have the following two cases:
EFFICIENT SUPPORT COUNTING OF CANDIDATE ITEMSETS FOR ASSOCIATION RULE MINING
183