A COMPREHENSIVE DATASET FOR EVALUATING APPROACHES

OF VARIOUS META-LEARNING TASKS

Matthias Reif

German Research Center for Artiﬁcial Intelligence, Trippstadter Strasse 122, 67663 Kaiserslautern, Germany

Keywords:

Meta-learning, Ranking, Algorithm selection, Dataset, Pattern recognition, Classiﬁcation.

Abstract:

New approaches in pattern recognition are typically evaluated against standard datasets, e.g. from UCI or

StatLib. Using the same and publicly available datasets increases the comparability and reproducibility of

evaluations. In the ﬁeld of meta-learning, the actual dataset for evaluation is created based on multiple other

datasets. Unfortunately, no comprehensive dataset for meta-learning is currently publicly available. In this

paper, we present a novel and publicly available dataset for meta-learning based on 83 datasets, six classi-

ﬁcation algorithms, and 49 meta-features. Different target variables like accuracy and training time of the

classiﬁers as well as parameter dependent measures are included as ground-truth information. Therefore, the

meta-dataset can be used for various meta-learning tasks, e.g. predicting the accuracy and training time of

classiﬁers or predicting the optimal parameter values. Using the presented meta-dataset, a convincing and

comparable evaluation of new meta-learning approaches is possible.

1 INTRODUCTION

For a convincing evaluation of new pattern recogni-

tion methods, appropriate datasets are essential and

a sound and fair comparison of competitive methods

requires that each method should be evaluated on ex-

actly the same data. Therefore, many scientiﬁc papers

use for their evaluations the same datasets from com-

mon sources like the UCI machine learning reposi-

tory (Asuncion and Newman, 2007) or StatLib (Vla-

chos, 1998).

In meta-learning, a dataset is based on multi-

ple other datasets and contains experience knowledge

about how learning algorithms, so called target algo-

rithms, performed on these datasets. Therefore, it is

required that multiple target algorithms are applied

on multiple datasets. Depending on the number of

considered algorithms and datasets, the creation of

a meta-dataset can be very computational expensive.

For the meta-learning step, datasets are represented

by characteristics of them, so called meta-features.

Unfortunately, previous publications in the do-

main of meta-learning typically use their own data for

evaluation that is not publicly available. The repro-

duction of such a meta-dataset is theoretically possi-

ble, but very hard in practice due to missing informa-

tion about used datasets, parameter values, and im-

plementations. Moreover, meta-learning methods are

usually evaluated only on a small number of under-

lying datasets using a set of unoptimized target clas-

siﬁers that is not diverse. In this paper, we present

a novel dataset that overcomes this limitations. The

dataset was created using 83 datasets from different

domains and sources, six target classiﬁers with dif-

ferent theoretical foundations including a parameter

optimization, and 49 meta-features, calculated by an

R-script that we made publicly available as well. Ad-

ditionally, the presented dataset includes multiple tar-

get measures such as accuracy and run-time that are

also available for each parameter combination consid-

ered during optimization.

The rest of the paper is structured as follows. First,

we give a more detailed introduction to meta-learning

in Section 2. In Section 3, we describe the creation of

the dataset. The ﬁnal section comprises the conclu-

sion.

2 META-LEARNING

Meta-learning uses knowledge about algorithms and

known datasets in order to make a prediction for a new

dataset. Datasets are represented by their properties

273

Reif M. (2012).

A COMPREHENSIVE DATASET FOR EVALUATING APPROACHES OF VARIOUS META-LEARNING TASKS.

In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, pages 273-276

DOI: 10.5220/0003736302730276

 SciTePress

using different measures, so called meta-features.

The meta-features the desired target variable is

computed for all known datasets. These data con-

struct the training data for the meta-learning step. The

resulting model is used for predicting the target vari-

able for a new, unknown dataset by applying it on the

meta-features of the new dataset. This approach is il-

lustrated in Figure 1.

Figure 1: Meta-learning uses meta-features and the desired

target value of known datasets for creating a meta-model

(top). This model is later used to predict the target value for

a new dataset (bottom).

The target variable depends on the goal of the

meta-learning approach. In the following, we will

present several meta-learning tasks that can be di-

rectly applied on the presented meta-dataset.

Best Classiﬁer. In this task, the target variable is

the best classiﬁer for each single dataset according

to some performance measure, e.g. the classiﬁcation

accuracy. Since this is a classiﬁcation problem, any

classiﬁer can be used. The outcome of the prediction

model is the best classiﬁer for the new dataset. This

approach was investigated in (Bensusan and Giraud-

Carrier, 2000a; Ali and Smith, 2006).

Ranking. The goal is to predict a ranked list of

all considered target algorithms, sorted according to

some performance measure, e.g. accuracy or time.

The target variable consists of the sorted list and a

nearest neighbor approach and scores are typically

used to predict the ranking. (Brazdil and Soares,

2000; Brazdil et al., 2003; Vilalta et al., 2004).

Quantitative Prediction. This approach directly pre-

dicts the performance or run-time of the target algo-

rithm in an appropriateunit. Since the prediction is in-

dependently performed for each considered target al-

gorithm, separate regression model has to be trained.

The quantitative prediction of error values was evalu-

ated by (Gama and Brazdil, 1995; Sohn, 1999; K¨opf

et al., 2000; Bensusan and Kalousis, 2001) and the

prediction of training-time was evaluated by (Reif

et al., 2011).

Predicting Parameters. Besides algorithm selection,

meta-learning can also be used for parameter predic-

tion. In this context, the target variable is one pa-

rameter value or a set of parameter values. Soares et

al. already investigated the parameter selection using

meta-learning for the Support Vector Machine classi-

ﬁer (Soares et al., 2004; Soares and Brazdil, 2006).

3 META-DATASET

In this section, the components of the dataset and its

creation will be described in more detail.

Meta-features. Meta-features can be grouped ac-

cording to their underlying analysis concepts. The

presented meta-dataset includes 49 meta-features

from the following six groups.

Simple Features are directly and easily accessible

properties of the dataset which need almost no com-

putations such as number of classes or number of at-

tributes. We included 17 simple meta-features.

Statistical Features use statistical analysis methods

and tests (Engels and Theusinger, 1998; Sohn, 1999).

Seven measures have been included, e.g. skewness

and kurtosis.

Information-theoretic Features typically use en-

tropy measures of the attributes and the class la-

bel (Segrera et al., 2008). We used seven features of

this group.

Model-based Features create a model of the data,

e.g. a decision tree, and use properties of it, e.g. the

width and height of the tree, as features. We fol-

lowed (Peng et al., 2002) and used 17 properties of

a decision tree.

Landmarking Features apply fast computable clas-

siﬁers, e.g. Naive Bayes or 1-Nearest Neighbor, on

the dataset (Pfahringer et al., 2000; Bensusan and

Giraud-Carrier, 2000b) and use the resulting perfor-

mance as meta-features. The meta-dataset contains

14 landmarking features.

Time-based Featuresare specialized for time predic-

tions. They contain time measures of several compu-

tations regarding the dataset, e.g. the time for com-

puting the other meta-features. Meta-features of this

group have the beneﬁt that they are able to take the

performance of the computer into account. Nine dif-

ferent time-measures have been included as presented

in (Reif et al., 2011).

The complete list of meta-features can be found

on the dataset website

Datasets. We used 83 datasets from the UCI ma-

chine learning repository (Asuncion and Newman,

2007), from StatLib (Vlachos, 1998), and from the

book “Analyzing Categorical Data” (Simonoff,2003).

All datasets contain 10 to 435 samples with 1 to 69

http://www.dfki.uni-kl.de/ reif/datasets/

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

274

Table 1: The classiﬁers, the number of optimized parame-

ters, and the number of evaluated parameter combinations

used for creating the meta-dataset.

Classiﬁer Parameters Combinations

Decision Tree 5 161051

k-NN 2 152

MLP 3 242

Naive Bayes 1 2

Ripper 4 2662

SVM 2 225

nominal and numeric attributes and 2 to 24 classes.

The complete list can also be found on the website

Classiﬁers. We selected classiﬁers that use different

learning foundations like tree or rule based learners

but also statistical and instance-based learners as well

as neural networks. The selected classiﬁcation algo-

rithms as well as the number of parameters optimized

during evaluation are listed in Table 1. Complete de-

tails are given on the website

3.1 Generation

After all features were normalized to the range [0, 1]

and nominal features have been converted to numeric

features for the SVM and MLP classiﬁers, every clas-

siﬁer was evaluated on each dataset using a grid

search and 10-fold cross-validation. The accuracy

of a classiﬁer is the highest accuracy achieved dur-

ing the search. The total training time of a classiﬁer

is the run-time of the search. Accuracy and training

time were also recorded for every considered param-

eter combination.

The ranking of classiﬁers for a single dataset was

determined by ordering the classiﬁers according to

their accuracy or total training time, respectively. The

best classiﬁer for a dataset is the top-ranked classiﬁer.

However, several classiﬁers may achieve the same ac-

curacy for a dataset. In such cases, classiﬁers with

equal accuracy were ordered according to their prior

probability of being the best classiﬁer. A different or-

dering, if necessary, can be easily achieved by using

the provided accuracy values.

The ground-truth data was created using Rapid-

Miner (Mierswa et al., 2006). Target times were gath-

ered by measuring the thread CPU time. For the cal-

culation of the meta-features, we wrote an R script

that is freely available on the website

and can be used

to easily extend the meta-dataset by more datasets.

Based on the generated data, we created several

variants of the meta-dataset that are directly applica-

0.2

0.4

0.6

0.8

Decision Tree

k-NN

MLP

Naive Bayes

Ripper

SVM

Mean Accuracy

(a) The mean accuracy achieved by

the classiﬁers over all 83 datasets in-

cluding standard deviation.

Decision Tree

k-NN

MLP

Naive Bayes

Ripper

SVM

# Best Classifier

overall

solely

(b) The number of datasets on

which the classiﬁer achieved the

highest accuracy overall and solely.

Figure 2: Statistics of the classiﬁers.

ble to one of the tasks described in Section 2. All

of these variants share most of the meta-features and

principally differ by the target variable. Variants with

an accuracy related target value contain all meta-

features but the time-based measures whereas the

variants for time-based predictions contain all meta-

features but the landmarking features. Datasets for

parameter prediction contain all parameter combina-

tions. All variants are available as separate plain CSV

ﬁles and in the XRFF format

on our website

3.2 Statistics

Finally, we present some statistics of the meta-dataset.

Figure 2(a) shows the classiﬁcation accuracyachieved

by the target classiﬁers averaged over all datasets in-

cluding standard deviation. It is visible that the more

sophisticated algorithms achieve almost the same av-

erage accuracy, but the simple k-Nearest Neighbor al-

gorithm achieved comparable results, as well.

However, if we look at the frequency of a classi-

ﬁer being the best choice for a dataset, the differences

are more signiﬁcant. Figure 2(b) shows how often a

classiﬁer achieved the highest accuracy solely (dark

gray) and how often it achieved the highest accuracy

where another classiﬁer achieved this value as well

(light gray). It is visible that SVM and Ripper seem

to be superior for many cases, but also the simple ap-

proaches of k-Nearest Neighbor and Naive Bayes are

the best classiﬁers for several datasets.

4 CONCLUSIONS

In this paper, we presented a novel and publicly avail-

able dataset that allows rapid experiments and evalu-

ations of various meta-learning approaches.

http://weka.wikispaces.com/XRFF

A COMPREHENSIVE DATASET FOR EVALUATING APPROACHES OF VARIOUS META-LEARNING TASKS

275

The dataset is based on six classiﬁers with differ-

ent theoretical foundations, 83 datasets from differ-

ent domains, and 49 meta-features from six different

groups. The R-script for computing the meta-features

is also publicly available to make extensions of the

meta-dataset easier.

A brief analysis of the gathered data showed that

the accuracy of a speciﬁc classiﬁer has a large devi-

ation and that also very simple classiﬁers like Naive

Bayes are still the best choice for some datasets. Both

aspects make the presented meta-dataset and meta-

learning in general a challenging task.

REFERENCES

Ali, S. and Smith, K. A. (2006). On learning algorithm

selection for classiﬁcation. Applied Soft Computing,

6:119–138.

Asuncion, A. and Newman, D. (2007).

UCI machine learning repository.

http://www.ics.uci.edu/∼mlearn/MLRepository.html

University of California, Irvine, School of Informa-

tion and Computer Sciences.

Bensusan, H. and Giraud-Carrier, C. (2000a). Casa batl is

in passeig de grcia or how landmark performances can

describe tasks. In Proceedings of the ECML-00 Work-

shop on Meta-Learning: Building Automatic Advice

Strategies for Model Selection and Method Combina-

tion, pages 29–46.

Bensusan, H. and Giraud-Carrier, C. G. (2000b). Discov-

ering task neighbourhoods through landmark learning

performances. In PKDD ’00: Proceedings of the 4th

European Conference on Principles of Data Mining

and Knowledge Discovery, pages 325–330, London,

UK. Springer Berlin / Heidelberg.

Bensusan, H. and Kalousis, A. (2001). Estimating the pre-

dictive accuracy of a classiﬁer. In De Raedt, L. and

Flach, P., editors, Machine Learning: ECML 2001,

volume 2167 of Lecture Notes in Computer Science,

pages 25–36. Springer Berlin / Heidelberg.

Brazdil, P., Soares, C., and da Costa, J. P. (2003). Rank-

ing learning algorithms: Using IBL and meta-learning

on accuracy and time results. Machine Learning,

50(3):251–277.

Brazdil, P. B. and Soares, C. (2000). Zoomed ranking: Se-

lection of classiﬁcation algorithms based on relevant

performance information. In In Proceedings of Prin-

ciples of Data Mining and Knowledge Discovery, 4th

European Conference, pages 126–135.

Engels, R. and Theusinger, C. (1998). Using a data met-

ric for preprocessing advice for data mining applica-

tions. In Proceedings of the European Conference on

Artiﬁcial Intelligence (ECAI-98, pages 430–434. John

Wiley & Sons.

Gama, J. and Brazdil, P. (1995). Characterization of classiﬁ-

cation algorithms. In Pinto-Ferreira, C. and Mamede,

N., editors, Progress in Artiﬁcial Intelligence, vol-

ume 990 of Lecture Notes in Computer Science, pages

189–200. Springer Berlin / Heidelberg.

K¨opf, C., Taylor, C., and Keller, J. (2000). Meta-analysis:

From data characterisation for meta-learning to meta-

regression. In Proceedings of the PKDD-00 Workshop

on Data Mining, Decision Support,Meta-Learning

and ILP.

Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and

Euler, T. (2006). Yale: Rapid prototyping for com-

plex data mining tasks. In Ungar, L., Craven, M.,

Gunopulos, D., and Eliassi-Rad, T., editors, KDD ’06:

Proceedings of the 12th ACM SIGKDD international

conference on Knowledge discovery and data mining,

pages 935–940, New York, NY, USA. ACM.

Peng, Y., Flach, P., Soares, C., and Brazdil, P. (2002). Im-

proved dataset characterisation for meta-learning. In

Lange, S., Satoh, K., and Smith, C., editors, Discovery

Science, volume 2534 of Lecture Notes in Computer

Science, pages 193–208. Springer Berlin / Heidelberg.

Pfahringer, B., Bensusan, H., and Giraud-Carrier, C. (2000).

Meta-learning by landmarking various learning algo-

rithms. In In Proceedings of the Seventeenth Interna-

tional Conference on Machine Learning, pages 743–

750. Morgan Kaufmann.

Reif, M., Shafait, F., and Dengel, A. (2011). Prediction of

classiﬁer training time including parameter optimiza-

tion. In 34th Annual German Conference on Artiﬁcial

Intelligence KI11, Berlin, Germany.

Segrera, S., Pinho, J., and Moreno, M. (2008). Information-

theoretic measures for meta-learning. In Corchado,

E., Abraham, A., and Pedrycz, W., editors, Hybrid Ar-

tiﬁcial Intelligence Systems, volume 5271 of Lecture

Notes in Computer Science, pages 458–465. Springer

Berlin / Heidelberg.

Simonoff, J. S. (2003). Analyzing Categorical Data.

Springer Texts in Statistics. Springer Berlin / Heidel-

berg.

Soares, C. and Brazdil, P. B. (2006). Selecting parameters

of SVM using meta-learning and kernel matrix-based

meta-features. In SAC ’06: Proceedings of the 2006

ACM symposium on Applied computing, pages 564–

568, New York, NY, USA. ACM.

Soares, C., Brazdil, P. B., and Kuba, P. (2004). A meta-

learning method to select the kernel width in support

vector regression. Machine Learning, 54(3):195–209.

Sohn, S. Y. (1999). Meta analysis of classiﬁcation al-

gorithms for pattern recognition. Pattern Analy-

sis and Machine Intelligence, IEEE Transactions on,

21(11):1137 –1144.

Vilalta, R., Giraud-carrier, C., Brazdil, P. B., and Soares,

C. (2004). Using meta-learning to support data min-

ing. International Journal of Computer Science and

Applications, 1(1):31–45.

Vlachos, P. (1998). StatLib datasets archive.

http://lib.stat.cmu.edu Department of Statistics,

Carnegie Mellon University.

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

276