Definition 2.11. A fuzzy itemset hX, F
X
i is called
local large fuzzy itemset at site S
i
if F S
i
hX,F
X
i
≥
minsup.
Definition 2.12. If a fuzzy itemset hX, F
X
i is both
globally large and locally large at a site S
i
, it is called
gl-large fuzzy itemset at site S
i
.
In the following, we will denote with L the set of
all globally large fuzzy itemsets in DB, and with L
(k)
the set of all globally large k-fuzzy itemsets in DB.
Problem 2 (Distributed Mining Fuzzy Association
Rules). Given the set of items I, the distributed data-
base DB = {DB
1
, DB
2
, . . . , DB
n
}, the fuzzy sets
associated with attributes from I, the minimum sup-
port threshold (minsup) and the minimum confidence
threshold (minconf ), extract all global fuzzy associ-
ation rules.
3 THE DISTRIBUTED
ALGORITHM
In (Cheung D.W., 1996), the authors proposed a DMA
algorithm for mining boolean association rules from
distributed databases.
3.1 Generate set of candidate fuzzy
itemsets
The candidate fuzzy itemsets reduction is made on the
basis of the properties of the global large fuzzy item-
sets and local large fuzzy itemsets subsequently pre-
sented:
Lemma 2. If a fuzzy itemset hX, F
X
i is locally large
at a site S
i
, then all its subsets are also locally large
at site S
i
,
Lemma 3. If a fuzzy itemset hX, F
X
i is globally
large, then there exist a site S
i
, (1 ≤ i ≤ n), such
that hX, F
X
i is locally large at site S
i
.
Lemma 4. If a fuzzy itemset hX, F
X
i is gl-large fuzzy
itemset at a site S
i
, (1 ≤ i ≤ n), then all its sub-fuzzy
itemsets, hY, F
Y
i, Y ⊆ X, are also gl-large fuzzy
itemsets at site S
i
.
We use GL
i
to denote the set of all gl-large fuzzy
itemsets at site S
i
, and GL
i
(k)
to denote all k-gl-large
fuzzy itemsets at site S
i
.
Lemma 5. If hX, F
X
i ∈ L
(k)
, (i.e. is a globally
large fuzzy k-itemset), then there exists a site S
i
,
(1 ≤ i ≤ n) such that hX, F
X
i and all its (k-1) sub-
fuzzy itemsets are gl-large fuzzy itemsets at site S
i
.
Like in the DMA algorithm, which is an adapta-
tion of the Apriori algorithm, at k-th iteration, the
set of candidate sets is obtained by applying the
Fuzzy
Apriori Gen function on L
(k−1)
. We denote
this set by CA
(k)
. More exactly,
CA
(k)
= Fuzzy Apriori Gen(L
(k−1)
).
For each site S
i
, (1 ≤ i ≤ n), we denote with
CG
i
(k)
the set of candidate fuzzy itemsets generated
applying Fuzzy Apriori Gen on GL
i
(k−1)
, i.e.,
CG
i
(k)
= Fuzzy
Apriori Gen(GL
i
(k−1)
).
Because GL
i
(k−1)
⊆ L
(k−1)
, then CG
i
(k)
is a
subset of CA
(k)
. Following, we denote CG
(k)
=
S
n
i=1
CG
i
(k)
.
Theorem 1. For every k > 1, the set of all globally
large k-fuzzy itemsets L
(k)
is a subset of CG
(k)
=
S
n
i=1
CG
i
(k)
.
Applying the Theorem 1 the result is that we can
use the set CG
(k)
, which is a superset of L
(k)
, as a
candidate set instead of CA
(k)
, and could be much
smaller that CA
(k)
.
Thus the candidate set for L
(k)
will be generated at
k-th iteration in the following manner: first the set of
candidate sets CG
i
(k)
can be generated locally at each
site S
i
. After this step, sites exchange fuzzy support
count and compute the set of gl-large fuzzy itemsets
GL
i
(k)
. Based on GL
i
(k)
, the candidate fuzzy itemsets
at S
i
for (k + 1)-st iteration can then be generated.
3.2 Local pruning of candidate sets
The Lemma 3 can be used to perform a local prun-
ing of the set of candidate fuzzy item sets. At a site
S
i
, after the set of candidate fuzzy itemsets CG
(k)
is
generated, in order to find if a candidate fuzzy itemset
hX, F
X
i ∈ CG
i
(k)
is gl-large fuzzy itemset, the fuzzy
support count must be requested from all other sites.
We can prune this request for fuzzy support count for
some candidates using a local pruning technique. The
basic idea is that at site S
i
, if a candidate fuzzy item-
set hX, F
X
i ∈ CG
i
(k)
is not locally large at site S
i
,
there is no need for S
i
to compute global support to
find out if it is globally large. This is possible because
in this case, either hX, F
X
i is not globally large, or
it will be locally large at some other site, and hence
only the sites where hX, F
X
i is locally large need to
be responsible to find its global support count. We use
LL
i
(k)
to denote those fuzzy candidate items in CG
i
(k)
which are locally large at site S
i
.
3.3 The algorithm outline
In Algorithm 1 is presented in detail the FUZZY-
DMA algorithm for distributed mining of association
WEBIST 2005 - INTERNET COMPUTING
208