3 ALGORITHM DESCRIPTION
We now introduce the InterTraSM algorithm. As we
mentioned before, InterTraSM is an adaptation of
the SmartMiner algorithm described in (Zou, Chu
and Lu, 2002) for intertransaction association rule
mining. Therefore the theoretical foundations of the
two algorithms are very similar. Our main
contribution has been to identify intertransaction
mining as a domain where searching for MFI would
lead to an improvement in performance, and to apply
and customize the existing algorithm to the
intertransaction analysis case; we also provided a
new implementation.
InterTraSM finds maximal frequent itemsets
(MFI) of extended items from a transactional
database. The algorithm uses depth first search and
for performance optimization it uses a dynamic
reordering to eliminate infrequent items from the tail
of a current node. A hash table is also used to save
the itemsets discovered as frequent at node-level, in
order not to go down a tree path that was already
investigated while exploring a maximal frequent
itemset.
As we mentioned the algorithm performs a depth
first search, so at any step it works on a node from a
search tree. We describe below the data managed at
the level of a node used by the algorithm and how
the data is processed.
A node N is identified as X:Y, where X (the
head) is the set of items that have been discovered to
be part of a frequent itemset, and Y (the tail) is the
set of items that still have to be explored. The
purpose of the node is to find maximal frequent
itemsets in the transaction set T(X) – all the
transactions that include X.
The starting node is
Φ :E (the empty set and the
set of all the possible extended items).
The entry data for a node are:
- the transaction set T(X)
- the tail Y
- the global data information Ginf, which is the
tail information for the node known so far
(this contains the itemsets that have been
discovered in a previous step to be frequent
in T(X)).
The exit data for a node are:
- the updated GInf
- the discovered maximum frequent itemsets
Mfi.
The data processing at a node N is described
below:
- count the support for each item from Y in the
transaction set T(X)
- remove the infrequent items
- while (Y has at least one element)
- select from Y an item a
i
to be the head of
the next state S
1
- Y
1+i
= Y – a
i
will be the tail for S
1+i
- obtain the auxiliary tail information for
S
1+i
by projecting on Y
1
the itemsets that
contain a
0
from the tail information Ginf.
- recursively call the algorithm for the node
N
1+i
= Xa
i
: Y
1+i
. The returned values will
be Mfi
i
and the updated tail information.
- Y = Y
1+i
- end (while)
The processing of the node returns the maximal
frequent itemsets to be Mfi =
∪
a
i
Mfi
i
, and the
updated Ginf as the itemsets in the original Ginf that
have not been marked as deleted.
As we mentioned, InterTraSM uses extended
items instead of the intratransaction items used by
SmartMiner. A customization for this case is that the
first node we select while searching in depth from
the root of the tree corresponds only with items from
the first interval (interval 0). This was done because
each frequent itemset has to have at least an item
from interval 0.
We created our own implementation of the
algorithm using C, with a structure similar to the one
for the SmartMiner algorithm described in (Zou,
Chu and Lu, 2002) – but with modifications for the
intertransaction case (The SmartMiner algorithm
was implemented in Java). We felt that writing the
algorithm in C enabled us to better control the
memory use of the algorithm.
4 PERFORMANCE STUDY
We used both synthetic data and real data to evaluate
the performance of the algorithm.
To generate the synthetic data we used the same
generator as the one described in (Luhr, West and
Venkatesh, 2007) to evaluate EFP-Tree, gracefully
provided to us by the authors. It uses the same
method as the one used to evaluate FITI and ITP-
Miner.
The real data consists of two datasets, WINNER
and LOSER, similar to those used in (Lee and
ICSOFT 2008 - International Conference on Software and Data Technologies
150