2 RELATED WORK
Many researches have been employing taxonomies
and ontologies as background knowledge in mining
association rules in order to enhance the knowledge
discovery process. (Hou, et al., 2005) uses domain
knowledge to generalize low level rules discovered
by traditional rule mining algorithms, in order to get
fewer and clearer high level rules. In (Brisson, et al.,
2005), domain knowledge is used in pre and post-
processing steps. The preprocessing step uses
ontology to guide the construction of specific
datasets for particular mining tasks. In the post-
processing step, mined rules are interpreted and
filtered, as terms are generalized based on the
ontology.
“However, in many real-world applications, the
taxonomic structure may not be crisp but fuzzy.”
(Chen, et al., 2000, p.47). For this porpose, (Chen, et
al., 2000) developed an algorithm to mine
generalized association rules with fuzzy taxonomic
structures. The algorithm considers the R-interest
measure, which is used to eliminate redundants and
inconsistent rules. Fuzzy association rule mining,
developed by (Farzanyar, et al., 2006), is driven by
domain knowledge in order to make the rules more
visual, more interesting and understandable.
(Escovar, et al., 2006) have proposed another
approach that uses fuzzy ontologies to represent the
semantic similarity relations among mined data. In
this case, the mining algorithm (XSSDM algorithm)
considers a new measure, called minimum similarity
(minsim). If two itens have the similarity degree
greater than or equal the minsim, fuzzy associations
are made and can be expressed in the association
rules extracted by the algorithm. For example, if
item
1
and item
2
have the degree of similarity greater
than or equal minsim in the fuzzy ontology, a fuzzy
association is made and a fuzzy itemset is created
(represented as item
1
~item
2
).
Although generalized association rule mining
approaches based on fuzzy ontology express
semantically richer information, they may result in a
great amount of extracted rules and redundant rules.
Then, redundancy treatment has been an interesting
research topic. In (Han and Fu, 1999) a multiple
level association rule was proposed to reduce the
number of generalized association rules. This
consists in defining different minsup values for each
level of a given taxonomy. Higher levels have bigger
minsup values. Other approaches like (Kunkle, et al.,
2008) implement its method to reduce the amount of
generalized association rules and redundant rules
during the pattern extraction process. (Kunkle, et al.,
2008) implemented the MFGI_class algorithm based
on maximal frequent itemset theory (Bayardo,
1998). (Chen, et al., 2000), and (Oliveira, et al.,
2007) are some approaches that treat the problem
after the processing stage. The first one does the
generalization process based on the R-interest
measure. This measure prunes redundant rules, only
considering the rules which degree of support or
confidence is R times the expected degree of support
or confidence. (Oliveira, et al., 2007) generalizes
only if the descendents of an ancestor generate rules,
and the rule of the ancestor has support value x%
greater than the descendent that generate a rule with
the biggest support among its siblings. (Miani, et al,
2009) proposed NARFO algorithm, which decreases
the number of redundancy rules by its generalizing
and redundancy treatment.
NARFO* and NARFO have the same features,
but NARFO* reduces the amount of NARFO’s rules
by the introduction of minGen parameter, with more
semantic and no equivocated information.
3 NARFO* ALGORITHM
This section explains the NARFO* algorithm. The
introduction of minGen parameter is the main
contribution of this work. MinGen works especially
generalizing rules with low minimum support,
without semantic lost. If X% of descendents is
included in rules, the generalization is done, and the
items that are not part of the generalization are
showed to avoid wrong information. Considering the
fuzzy ontology of figure 1, if minGen value is 0,6
and rules like Apple
Æ
Turkey, Kaki
Æ
Turkey are
generated, the algorithm generalizes these rules to
Fruit
Æ
Turkey, since Apple and Kaki are more than
60% of Fruit’s descendents (minGen value is 0,6).
The rule are showed as Fruit(-Tomato)
Æ
Turkey, indicating that the item tomato did not
compose the generalization.
Besides minGen parameter, the algorithm also
eliminate redundant rules, preserving their semantic.
If both Apple~Tomato
Æ
Chicken and Apple
Æ
Chicken are extracted, NARFO* only considers the
first one, exhibiting with a plus (+) the item more
relevant in the fuzzy itemset of the rule
Apple(+)~Tomato
Æ
Chicken.
3.1 Data Scanning
This step identifies the items in the database
generating itemsets of one size (1-itemset).
Considering the fuzzy ontology of figure 1, this
NARFO* ALGORITHM - Optimizing the Process of Obtaining Non-redundant and Generalized Semantic Association
Rules
321