Step2. All count frequent itemsets are inserted to C ,
and their id-lists are generated.
Step3. A candidate itemset c ∈ C is selected from
candidate itemset set at random.
Step4. f
c
is evaluated from L
c
. If a super-itemsets
or sub-itemsets of c exists in E , f
c
is evaluated by
inheriting it. Then, c is deleted from C .
• If c is probabilistic frequent, c is inserted into
F . When any sub-itemset c
′
of c has not been
evaluated yet, f
c
′
is evaluated, inserted into F
and deleted from C . A super-itemset of c in C
is selected as the candidate itemsets evaluating
spmf next.
• If c is not probabilistic frequent, all super-
itemset of c are deleted from C . A sub-itemset
of c in C is selected as the candidate itemsets
evaluating spmf next.
• If super-itemset and sub-itemset do not exist in
C , a candidate itemset in C is selected from
candidate itemset set at random.
Step5. If the candidate itemset evaluating spmf next
is empty, this procedure terminates. Otherwise,
step4 is repeated.
In skip search approach, candidate itemsets which
are selected at random cannot be evaluated by inherit-
ing their super/sub-itemsets. The spmf of these item-
sets are evaluated by algorithm DP or DC, it is costly.
To solve this problem, we propose another approach,
“skip search approach from maximal”. A maximal
candidate itemsets(Bayardo, 1998) is selected in Step
3. In Step 4, candidate itemsets which are sub-itemset
of selected maximal candidate itemset are evaluated.
This procedure can avoid using DC in Step 4. The
spmf of maximal candidate itemsets have to be evalu-
ated by DC, and most of these itemsets are not prob-
abilistic frequent. However, the cost of evaluating
them by DP or DC is relatively small. Because the
support of these itemset is small. Example of the or-
der to evaluate spmf in skip search approach from
maximal is shown in Figure 1. First, a maximal
candidate itemset {a, b, c, d, e} is selected and eval-
uated. Then a candidate itemset {a, c, d} which is
a sub-itemset of {a, b, c, d, e} is selected and eval-
uated. Since {a, c, d} is not probabilistic frequent,
{a, b, c, d} and {a, c, d, e} which are super-itemsets of
{a, c, d} are deleted. We can omit to evaluate spmf
of {a, b, c, d} and {a, c, d, e}. Next, {a, d} which is
a sub-itemset of {a, c, d} is selected and evaluated.
{a, d} is probabilistic frequent, so spmf of all its sub-
itemsets, {a} and { d}, are evaluated. Then, {a, d, e}
which is a super-itemset of {a, d} is selected as a can-
didate itemset to evaluate next. When all sub-itemsets
Figure 1: Example of the order to evaluate spmf in skip
search approach from maximal.
of {a, b, c, d, e} were evaluated, other maximal candi-
date itemset, for example {d, e, f}, is selected as a
candidate itemset to evaluate next.
This enhance is effective when all candidate item-
sets and their id-lists cannot fit in memory. Since the
information of a maximal candidate itemset and its
sub-itemsets is required in Step 4, we can reduce the
size of memory usage.
4 PERFORMANCE EVALUATION
We evaluated the performance of skip search ap-
proaches by comparing with the top-down manner
algorithm, TODIS, and the bottom-up manner algo-
rithm, p-Apriori. In experiments, p-Apriori evaluates
spmf by ihneriting sub-itemsets described in section
3.1, that is more efficient than using DC. This algo-
rithm is denoted as “p-Apriori w diffcalc” in experi-
mantal results. In skip search approaches and TODIS,
the count frequent itemsets are found by Apriori algo-
rithm. In experimental results, the naive skip search
approach is denoted as “skip”. The skip search ap-
proach described in section 3.3 is denoted as “skip f
max”.
To evaluate the performance of our approach,
synthetic data emulating retail transactions are used,
where the generation procedure is based on the
method described in (Agrawal and R.Srikant, 1994).
The average length of a transaction is 40, the average
length of a frequent itemset is 10, and the dataset size
N is 500k. For each transaction, we set the existential
probability with a Gaussian distribution.
Figure 2 shows the execution time varying
minsup. Here, minprob was set to 0.3. When minsup
is small, the difference between the execution time of
skip search approaches and TODIS. Since the aver-
age length of probabilistic frequent itemsets becomes
long for small minsup, the number of candidate item-
sets which skip search approaches can omit to evalu-