Authors:
Mathias Goller
1
;
Markus Humer
2
and
Michael Schrefl
3
Affiliations:
1
Data & Knowledge Engineering, Johannes-Kepler-University Linz, Austria
;
2
utanet, Austria
;
3
Data & Knowledge Engineering, Johannes-Kepler-University, Austria
Keyword(s):
Sequences of data mining algorithms, pre-computing intermediate results, clustering, decision tree construction, naive bayes.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Business Analytics
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Datamining
;
Enterprise Information Systems
;
Health Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Depending on the goal of an instance of the Knowledge Discovery in Databases (KDD) process, there are instances that require more than a single data mining algorithm to determine a solution. Sequences of data mining algorithms offer room for improvement that are yet unexploited.
If it is known that an algorithm is the first of a sequence of algorithms and there will be future runs of other algorithms, the first algorithm can determine intermediate results that the succeeding algorithms need. The anteceding algorithm can also determine helpful statistics for succeeding algorithms. As the anteceding algorithm has to scan the data anyway, computing intermediate results happens as a by-product of computing the anteceding algorithm’s result.
On the one hand, a succeeding algorithm can save time because several steps of that algorithm have already been pre-computed. On the other hand, additional information about the analysed data can improve the quality of results such as the accuracy o
f classification, as demonstrated in experiments with synthetical and real data.
(More)