metadata. In RDF, statements about resources can be
made in the form subject-predicate-objectexpressions
and they are called triples. Hence, our repository is
defined as a triple store, where we used OpenLink
Virtuoso as a storage engine. The repository is pro-
vided as a Web Service and an application for meta-
data management is built on top of it. JavaServer
Pages (JSP), Asynchronous JavaScript (AJAX) and
XML are used to implement the application and the
graphical user interface.
7 CONCLUSION AND FUTURE
WORK
The process of knowledge discovery is challenging.
Data relevant to the analysis needs to be selected, pre-
processed, mined and finally evaluated. Beginners are
alarmed by the myriad of operators and more experi-
enced users limit their activity to several known ap-
proaches. A thorough user assistance is necessary.
Therefore, systems with the aim of assisting the user
during this process are built. We studied these sys-
tems with the goal of identifying the metadata used to
enable the assistance. Hence, we identified the meta-
data used to provide user support during the KDD
process. We found out that important metadata such
as domain knowledge and lineage which can make
the life of a data analyst easy, have not been con-
sidered. We provided a classification of the meta-
data found. We proposed a comprehensive metadata
framework that captures the complete range of meta-
data needed to assist the user during the whole process
of KDD. We showed the importance of such metadata
in a real project by implementing a metadata reposi-
tory to store and manage the whole range of metadata.
In our future work, we are planning to extend the
domain knowledge incorporated into the repository
and we are planning to develop tools for exploiting
the metadata. We are going to test different ways of
reasoning on top of the metadata. Moreover, we will
be exploring the idea of incorporating meta learning
into the whole picture.
ACKNOWLEDGMENTS
This research has been funded by the European Com-
mission through the Erasmus Mundus Joint Doctorate
“Information Technologies for Business Intelligence -
Doctoral College” (IT4BI-DC).
REFERENCES
Bernstein, A., Provost, F. J., and Hill, S. (2005). Toward
intelligent assistance for a data mining process: An
ontology-based approach for cost-sensitive classifica-
tion. IEEE TKDE, 17(4).
CWM (2003). Object Management Group: Common
warehouse metamodel specification. Available at
http://www.omg.org/spec/CWM/1.1/PDF/.
Diamantini, C., Potena, D., and Storti, E. (2009). Ontology-
driven KDD process composition. In IDA.
Engels, R. (1996). Planning tasks for KDD; performing
task-oriented user-guidance. In KDD.
Fayyad, U. M. et al. (1996). From data mining to knowledge
discovery in databases. AI Magazine, 17(3).
Foshay, N. et al. (2007). Does data warehouse end-user
metadata add value? Commun. ACM, 50(11).
Giraud-Carrier, C. (2005). The data mining advisor: meta-
learning at the service of practitioners. In ICMLA.
Ho, T. K. and Basu, M. (2002). Complexity measures of su-
pervised classification problems. IEEE TPAMI, 24(3).
Ioannis Kopanas, N. M. A. and Daskalaki, S. (2002). The
role of domain knowledge in a large scale data mining
project. In SETN.
Kalousis, A. et al. (2014). Using meta-mining to support
DM workflow planning and optimization. JAIR,51(1).
Kalousis, A. and Hilario, M. (2001). Model selection via
meta-learning: A comparative study. IJAIT, 10(4).
Kietz, J., Serban, F., Fischer, S., and Bernstein, A. (2014).
Semantics Inside! But Let’s Not Tell the Data Miners:
Intelligent Support for Data Mining. In ESWC.
Lindner, G. and Studer, R. (1999). AST: support for algo-
rithm selection with a CBR approach. In PKDD.
Moreau, L. et al. (2011). The open provenance model core
specification (v1.1). FGCS, 27(6).
Morik, K. and Scholz, M. (2002). The miningmart ap-
proach. In Informatik bewegt: Informatik.
Raes, J. (1992). Inside two commercially available statisti-
cal expert systems. Statistics and Computing, 2(2).
Serban, F., Vanschoren, J., Kietz, J., and Bernstein, A.
(2013). A survey of intelligent assistants for data anal-
ysis. ACM Comput. Surv., 45(3).
Simmhan, Y. L., Plale, B., and Gannon, D. (2005). A survey
of data provenance in e-science. SIGMOD Rec., 34(3).
Sleeman, D. H., Rissakis, M., Craw, S., Graner, N., and
Sharma, S. (1995). Consultant-2: pre- and post-
processing of ML applications. IJHCS, 43(1).
Varga, J. et al. (2014). Towards next generation BI systems:
The analytical metadata challenge. In DaWaK.
Z´akov´a, M., Kremen, P., Zelezn´y, F., and Lavrac, N.
(2011). Automating KD workflow composition
through ontology-based planning. IEEE T-ASE, 8(2).