Figure 9: Time breakdown on the
census
dataset (a),
mushroom
dataset (b) and
T20I6D300K
dataset (c).
oriented algorithms to do the mining. Also the out-
put rules are translated into an XML representation.
As a consequence, the loosely-coupled architecture
of XMineRule makes it difficult to use optimiza-
tions based on the pruning of the search space, since
constraints can be evaluated only at pre- or post-
processing time. From the semantics perspective,
items have an XML-based hierarchical tree structure
in which rules describe interesting relations among
fragments of the XML source (Feng and Dillon,
2004). In contrast, in our approach, items are denoted
by using simple structured data from the domains of
basic data types, favouring both the implementation
of efficient data structures and the design of powerful
domain-specific optimizations evaluated as deeper as
possible in the extraction process. Domain knowledge
is linked to items through XML metadata elements.
4.2 Conclusions and Future Work
In this paper, we proposed a new QL as a solution to
the XML data mining problem. In our view, an XML
native database is used as a storage for KDD entities.
DM tasks are expressed in an XQuery-like language.
The syntax of the language is flexible enough to spec-
ify a variety of different mining tasks by means of
user-defined functions in the statements. These ones
provide to the user personalized sophisticated con-
straints, based, for example, on domain knowledge.
The first empirical assessment reported in Section 3.2
exhibits promising results, even if only related to the
XML frequent itemsets mining problem.
Summing up, our project aims at a completely
general solution for DM. Clearly, the generality is
even more substantial in XML-based languages, since
no general-purpose XML mining language has been
yet proposed (at the best of our knowledge). An in-
teresting on-going work includes the exploitation of
ontologies to represent the metadata. As an example,
ontologies may represent enriched taxonomies, used
to describe the application domain by means of data
and object properties. As a consequence, they may
provide enhanced possibilities to constrain the mining
queries in a more expressive way. This opportunity is
even more substantial in our project, since ontologies
are typically represented via the Web Ontology Lan-
guage (OWL) (W3C World Wide Web Consortium,
2004), de facto an XML-based language.
REFERENCES
Agrawal, R. and Srikant, R. (1994). Fast algorithms for
mining association rules. In VLDB ’94, pages 487–
499, Santiago de Chile, Chile.
Braga, D., Campi, A., Ceri, S., Klemettinen, M., and
Lanzi, P. (2003). Discovering interesting information
in XML data with association rules. In SAC ’03, pages
450–454, Melbourne, Florida.
Euler, T., Klinkenberg, R., Mierswa, I., Scholz, M., and
Wurst, M. (2006). YALE: rapid prototyping for com-
plex data mining tasks. In KDD ’06, pages 935–940,
Philadelphia, PA, USA.
Feng, L. and Dillon, T. S. (2004). Mining Interesting XML-
Enabled Association Rules with Templates. In KDID
’04, pages 66–88, Pisa, Italy.
Holupirek, A., Gr¨un, C., and Scholl, M. H. (2009). BaseX
and DeepFS joint storage for filesystem and database.
In EDBT ’09, pages 1108–1111, Saint Petersburg,
Russia.
Imielinski, T. and Mannila, H. (1996). A database perspec-
tive on knowledge discovery. Comm. Of The Acm,
39(11):58–64.
Meo, R. and Psaila, G. (2006). An XML-based database for
knowledge discovery. In EDBT ’06, pages 814–828,
Munich, Germany.
Mitchell, T. M. (1997). Machine Learning. McGraw-Hill.
Romei, A., Ruggieri, S., and Turini, F. (2006). KDDML: a
middleware language and system for knowledge dis-
covery in databases. Data Knowl. Eng., 57(2):179–
220.
The Data Mining Group (2009). The Predictive
Model Markup Language (PMML). Version 4.0.
www.dmg.org/v4-0/GeneralStructure.html
.
W3C World Wide Web Consortium (2004). OWL Web On-
tology Language. W3C Recommendation 10 Febru-
ary 2004.
http://www.w3.org/TR/owl-features
.
W3C World Wide Web Consortium (2007). XQuery 1.0:
An XML Query Language. W3C Recommendation
23 January 2007.
http://www.w3.org/TR/Query
.
XQUAKE - An XQuery-like Language for Mining XML Data
27