UNBOXING DATA MINING VIA DECOMPOSITION IN OPERATORS - Towards Macro Optimization and Distribution

Alexander Wöehrer; Yan Zhang; Ehtesam-ul-Haq Dar; Peter Brezany

doi:10.5220/0002333102430248

UNBOXING DATA MINING VIA DECOMPOSITION IN OPERATORS - Towards Macro Optimization and Distribution

Alexander Wöehrer, Yan Zhang, Ehtesam-ul-Haq Dar, Peter Brezany

2009

Abstract

Data mining deals with finding hidden knowledge patterns in often huge data sets. The work presented in this paper elaborates on defining data mining tasks in terms of fine-grained composable operators instead of coarse-grained black box algorithms. Data mining tasks in the knowledge discovery process typically need one relational table as input and data preprocessing and integration beforehand. The possible combination of different kind of operators (relational, data mining and data preprocessing operators) represents a novel holistic view on the knowledge discovery process. Initially, as described in this paper, for the low-level execution phase but yielding the potential for rich optimization similar to relational query optimization. We argue that such macro-optimization embracing the overall KDD process leads to improved performance instead of focusing on just a small part of it via micro-optimization.

References

Abe, H. and Yamaguchi, T. (2004). Constructive metalearning with machine learning method repositories. In IEA/AIE, pages 502-511.
Agrawal, R. and Srikant, R. (1995). Mining sequential patterns. In ICDE, pages 3-14.
Alpdemir, N. M., Mukherjee, A., Gounaris, A., Paton, N. W., Watson, P., Fernandes, A. A., and Fitzgerald, D. J. (2004). Ogsa-dqp: A service for distributed querying on the grid. In Advances in Database Technology - EDBT 2004, pages 858-861.
Antonioletti, M., Atkinson, M., Baxter, R., Borley, A., Hong, N. P. C., Collins, B., Hardman, N., Hume, A., Knox, A., Jackson, M., Magowan, J., Paton, N., Pearson, D., Sugden, T., Watson, P., and Westhead, M. (2005). The design and implementation of grid database services in ogsa-dai. Concurrency and Computation: Practice and Experience, 17:357-376.
Atkinson, M., Brezany, P., Corcho, O., Han, L., van Hemert, J., Hluchy, L., Hume, A., Janciak, I., Krause, A., Snelling, D., and W öhrer, A. (2008). Admire white paper: Motivation, strategy, overview and impact. http://www.admire-project.eu/docs/ADMIREWhitePaper.pdf.
Bernstein, A., Provost, F., and Hill, S. (2005). Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering, 17(4):503-518.
Botta, M., Boulicaut, J.-F., Masson, C., and Meo, R. (2004). Query languages supporting descriptive rule mining: A comparative study. In Database Support for Data Mining Applications, pages 24-51.
Fernandez-Baizan, M., Ruiz, E. M., Pena-Sanchez, J., and Pastrana, B. (1998). integrating KDD algorithms and RDBMS code. In Proceedings of RSCTC'98, pages 210-213.
Geist, I. and Sattler, K.-U. (2004). Towards data mining operators in database systems: Algebra and implementation. Technical Report 124, University of Magdeburg.
Gounaris, A., Paton, N. W., Fernandes, A. A. A., and Sakellariou, R. (2002). Adaptive query processing: A survey. In BNCOD, pages 11-25.
Graefe, G. (1993). Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73-170.
Graefe, G. and Davison, D. (1993). Encapsulation of parallelism and architecture-independence in extensible database query execution. IEEE Transactions on Software Engineering, 19(8):749-764.
Hettich, S. and Bay, S. (1999). The UCI KDD Archive.
Hofer, J. and Brezany, P. (2004). Digidt: Distributed classifier construction in the grid data mining framework gridminer-core. In In Proceedings of the Workshop on Data Mining and the Grid (GM-Grid 2004) held in conjunction with the 4th IEEE International Conference on Data Mining (ICDM'04).
Ioannidis, Y. E. (1996). Query optimization. ACM Computing Surveys, 28(1).
Johnson, T., Lakshmanan, L., and Ng, R. (2000). The 3w model and algebra for unified data mining. In VLDB, pages 21-32.
Kossmann, D. (2000). The state of the art in distributed query processing. ACM Computing Surveys (CSUR), 32(4).
Meo, R., Psaila, G., and Ceri, S. (1996). A new sql-like operator for mining association rules. In VLDB, pages 122-133.
Witten, I. H., Frank, E., Trigg, L., Hall, M., Holmes, G., and Cunningham, S. J. (1999). Weka: Practical machine learning tools and techniques with Java implementations. In Proceedings of the Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems, pages 192-196.
YUAN, X. (2003). Data mining query language design and implementation. Master's thesis, The Chinese University of Hong Kong.

Download

Paper Citation

in Harvard Style

Wöehrer A., Zhang Y., Dar E. and Brezany P. (2009). UNBOXING DATA MINING VIA DECOMPOSITION IN OPERATORS - Towards Macro Optimization and Distribution . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009) ISBN 978-989-674-011-5, pages 243-248. DOI: 10.5220/0002333102430248

in Bibtex Style

@conference{kdir09,
author={Alexander Wöehrer and Yan Zhang and Ehtesam-ul-Haq Dar and Peter Brezany},
title={UNBOXING DATA MINING VIA DECOMPOSITION IN OPERATORS - Towards Macro Optimization and Distribution},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)},
year={2009},
pages={243-248},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002333102430248},
isbn={978-989-674-011-5},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)
TI - UNBOXING DATA MINING VIA DECOMPOSITION IN OPERATORS - Towards Macro Optimization and Distribution
SN - 978-989-674-011-5
AU - Wöehrer A.
AU - Zhang Y.
AU - Dar E.
AU - Brezany P.
PY - 2009
SP - 243
EP - 248
DO - 10.5220/0002333102430248