Naturally, categories, movies, roles, actors,
awards and directors are concepts, with categories,
roles and awards specialized by some other
concepts. Relations are represented by directed
arrows. In order to allow the discovery of patterns
involving real actors and directors, the ontology is
enriched with a leaf concept for each known actor
and director. From this ontology and each row in a
denormalized table containing one row for each
participation on a movie, we can construct the
dataset to mine. In order to find patterns in the form
(director, category), (actor, role, category),
(category, award), we only need the axioms that
define equality for each leaf concept.
The identification of frequent molecular
fragments presents additional challenges to the
framework, since those patterns are structured
patterns, in the form of graphs. Allied to this
structural nature, molecules may have multiple
atoms for the same chemical element. In order to
deal with these particularities, the framework only
demands the definition of a new class of constraints
–structural constraints. A structural constraint is a
content constraint that defines a differentiated
areJoinable axiom. It only considers that two
itemsets are joinable if the maximal proper suffix of
the first itemset is equal to the maximal proper
prefix of the second one.
10100
......),(:...,...
−
⇔==∀
nnnn
ttsstseareJoinabltttsss
Note that, this predicate states the new conditions
to generate a candidate, and these conditions are just
the same used by sequential pattern mining
algorithms. For avoiding the problem of the
presence of multiple atoms of the same element, we
can represent a molecule as a chain of bonds, each
one involving two different atoms, as represented in
Figure 2-bottom. This is achieved by representing
each atom as an indexed one, for allowing multiple
identical bonds. For example, the ring of carbons in
Figure 2-bottom (right) would be represented as
(C
0
–C
1
,C
0
–C
3
,C
1
–C
2
,C
2
–C
3
). With these simple
tools, it is possible to identify exactly the same
patterns found by graph-mining algorithms.
5 CONCLUSIONS
The recent advances in the area of knowledge
representation makes possible to represent
background knowledge, in an effective way, using
ontologies. Since one of the main drawbacks of data
mining, in general, and of pattern mining, in
particular, is to ignore domain knowledge, with
those advances, it is time to surpass that feature.
This paper explains how the Onto4AR
framework can solve some of the main difficulties
faced by transactional pattern mining approaches,
like dealing with multiple concepts in the same
transaction either on dealing with structured data.
We showed that with the incorporation of
background knowledge in the core of the mining
process, by using domain ontologies and by defining
a set of constraints above them, it is possible to
address those difficulties naturally.
From the case studies described, it is easy to
realize the potentialities of the Onto4AR framework.
Indeed, the framework provides the necessary tools
to overcome several difficulties faced by pattern
mining techniques. Its conception, based on a
standard and widely recognized instrument for
representing existent domain knowledge, is one of
its strongest points, followed closely by its
simplicity and its extensibility.
However, experiments show that candidate-
based algorithms are not the most adequate to
perform the discovery. Definitely, the explosion of
candidates, resulting from the existence of multiple
equivalent concepts (as defined by their equal
predicate), strongly impairs algorithms performance.
However, and since several algorithms following
other approaches have been proposed with a fair
success, it is likely that they can be adapted to
function on this new context.
REFERENCES
Agrawal, R., Imielinsky, T., and Swami, A. Mining
Association Rules between Sets of Items in Large
Databases. In Proc. ACM SIGMOD Conf
Management of Data. 1993. 207-216
Antunes, C., and Oliveira, A.L., Constraint Relaxations for
Discovering Unknown Sequential Patterns. In
Knowledge Discovery in Inductive Databases: Third
International Workshop, Springer, 2005, 11-32
Antunes, C. Onto4AR: a framework for mining
association rules. In Proc. Int’l Workshop on
Constraint-Based Mining and Learning, 2007. 37-48
Antunes, C. An ontology-based method for mining
frequent patterns. Technical report, Instituto Superior
Técnico. 2008.
Bayardo, R.J., The Many Roles of Constraints in Data
Mining. In SIGKDD Explorations, vol. 4, nr. 1 pp. i-ii,
2002.
Garofalakis, M.N., Rastogi, R., and Shim, K., SPIRIT:
Sequential Pattern Mining with Regular Expression
Constraints. In Proc. Very Large Databases Conf.
1999, 223-234
Maedche, A., Ontology Learning for the Semantic Web,
Kluwer Academic Publishers, 2002.
Wiederhold, G., Movies Database Documentation, 1989.
MINING PATTERNS IN THE PRESENCE OF DOMAIN KNOWLEDGE
193