Table 1: Summarization of the XQuake language.
Inductive Database requirement XQuake perspective
Data and model storage Native XML Database (models represented via PMML)
KDD process representation XQuery program + special mining functions
KDD process parametrization Parametrization of XQuery functions
Closure principle Achieved by means of the XQuery closure
Constraints & interesting measures XQuery expression + built-in function library
Output specification XQuery expression (optional) + built-in function library
Data binding Based on the PMML mining schema
sented in this paper offers an idea of its potentialities
and advantages. First, XML data is mined where it
is, in a native XML database. Second, great attention
has been paid to the closure principle: the examples
highlight the ability of combining the results of the
knowledge extraction in order to evaluate certain in-
dicators, to compose preprocessing, data mining and
post-processing, and to use background knowledge to
filter models. Finally, the KDD process has now an
integrated view and it can be easily made modular and
parametric. Tab. 1 summarizes the main features of
XQuake, according to the inductive database princi-
ples.
Since our project aims at a completely general
solution for XML data mining, there are further ex-
tensions that need an in-depth investigation. An on
going work is the integration of both further knowl-
edge (specifically, sequential patterns) and a rich li-
brary of mining algorithms. Also, we are working
on providing the formal semantics of XQuake. Fu-
ture work can go in two (often orthogonal) directions:
(i) the exploitation of ontologies to represent meta-
data (on the expressiveness side), and (ii) the study of
query rewriting techniques for optimization purposes
(on the architectural side). The study of more sophis-
ticated high-level guis for the design of the queries is
another aspect to be considered in the future.
REFERENCES
Baralis, E., Garza, P., Quintarelli, E., and Tanca, L. (2007).
Answering XML queries by means of data summaries.
ACM Trans Info Syst, 25(3):1–10.
Blockeel, H., Calders, T., Fromont, E., Goethals, B., Prado,
A., and Robardet, C. (2008). An inductive database
prototype based on virtual mining views. In KDD,
pages 1061–1064, New York, NY, USA. ACM.
Euler, T., Klinkenberg, R., Mierswa, I., Scholz, M., and
Wurst, M. (2006). YALE: rapid prototyping for com-
plex data mining tasks. In KDD ’06, pages 935–940,
Philadelphia, PA, USA.
Holupirek, A., Gr¨un, C., and Scholl, M. (2009). BaseX and
DeepFS - Joint Storage for Filesystem and Database.
In EDBT, pages 1108–1111, Saint Petersburg, Russia.
ACM.
Meo, R. and Psaila, G. (2006). An XML-based database for
knowledge discovery. In EDBT ’06, pages 814–828,
Munich, Germany.
Romei, A., Ruggieri, S., and Turini, F. (2006). KDDML: a
middleware language and system for knowledge dis-
covery in databases. Data Knowl. Eng., 57(2):179–
220.
Romei, A. and Turini, F. (2010). XML data mining. Softw.,
Pract. Exper., 40(2):101–130.
Romei, A. and Turini, F. (2011a). Inductive database lan-
guages: requirements and examples. Knowl. Inf. Syst.,
26(3):351–384.
Romei, A. and Turini, F. (2011b). Programming the KDD
process using XQuery. Technical Report (extended
version) TR-11-10, University of Pisa, Department of
Computer Science.
Schmidt, A., Waas, F., Kersten, M., Carey, M. J.,
Manolescu, I., and Busse, R. (2002). XMark: a bench-
mark for XML data management. In VLDB, pages
974–985.
The Data Mining Group (2011). The Predictive
Model Markup Language (PMML). Version 4.0.1.
www.dmg.org/pmml-v4-0-1.html.
W3C (2010). XQuery 3.0: An XML Query Lan-
guage. W3C Working Draft 14 December 2010.
www.w3.org/TR/xquery-30/.
PROGRAMMING THE KDD PROCESS USING XQUERY
139