A COMPREHENSIVE DATASET FOR EVALUATING APPROACHES OF VARIOUS META-LEARNING TASKS

Matthias Reif

Abstract

New approaches in pattern recognition are typically evaluated against standard datasets, e.g. from UCI or StatLib. Using the same and publicly available datasets increases the comparability and reproducibility of evaluations. In the field of meta-learning, the actual dataset for evaluation is created based on multiple other datasets. Unfortunately, no comprehensive dataset for meta-learning is currently publicly available. In this paper, we present a novel and publicly available dataset for meta-learning based on 83 datasets, six classification algorithms, and 49 meta-features. Different target variables like accuracy and training time of the classifiers as well as parameter dependent measures are included as ground-truth information. Therefore, the meta-dataset can be used for various meta-learning tasks, e.g. predicting the accuracy and training time of classifiers or predicting the optimal parameter values. Using the presented meta-dataset, a convincing and comparable evaluation of new meta-learning approaches is possible.

References

  1. Ali, S. and Smith, K. A. (2006). On learning algorithm selection for classification. Applied Soft Computing, 6:119-138.
  2. Asuncion, A. and Newman, D. (2007). UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html University of California, Irvine, School of Information and Computer Sciences.
  3. Bensusan, H. and Giraud-Carrier, C. (2000a). Casa batl is in passeig de grcia or how landmark performances can describe tasks. In Proceedings of the ECML-00 Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pages 29-46.
  4. Bensusan, H. and Giraud-Carrier, C. G. (2000b). Discovering task neighbourhoods through landmark learning performances. In PKDD 7800: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 325-330, London, UK. Springer Berlin / Heidelberg.
  5. Bensusan, H. and Kalousis, A. (2001). Estimating the predictive accuracy of a classifier. In De Raedt, L. and Flach, P., editors, Machine Learning: ECML 2001, volume 2167 of Lecture Notes in Computer Science, pages 25-36. Springer Berlin / Heidelberg.
  6. Brazdil, P., Soares, C., and da Costa, J. P. (2003). Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning, 50(3):251-277.
  7. Brazdil, P. B. and Soares, C. (2000). Zoomed ranking: Selection of classification algorithms based on relevant performance information. In In Proceedings of Principles of Data Mining and Knowledge Discovery, 4th European Conference, pages 126-135.
  8. Engels, R. and Theusinger, C. (1998). Using a data metric for preprocessing advice for data mining applications. In Proceedings of the European Conference on Artificial Intelligence (ECAI-98, pages 430-434. John Wiley & Sons.
  9. Gama, J. and Brazdil, P. (1995). Characterization of classification algorithms. In Pinto-Ferreira, C. and Mamede, N., editors, Progress in Artificial Intelligence, volume 990 of Lecture Notes in Computer Science, pages 189-200. Springer Berlin / Heidelberg.
  10. Köpf, C., Taylor, C., and Keller, J. (2000). Meta-analysis: From data characterisation for meta-learning to metaregression. In Proceedings of the PKDD-00 Workshop on Data Mining, Decision Support,Meta-Learning and ILP.
  11. Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and Euler, T. (2006). Yale: Rapid prototyping for complex data mining tasks. In Ungar, L., Craven, M., Gunopulos, D., and Eliassi-Rad, T., editors, KDD 7806: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 935-940, New York, NY, USA. ACM.
  12. Peng, Y., Flach, P., Soares, C., and Brazdil, P. (2002). Improved dataset characterisation for meta-learning. In Lange, S., Satoh, K., and Smith, C., editors, Discovery Science, volume 2534 of Lecture Notes in Computer Science, pages 193-208. Springer Berlin / Heidelberg.
  13. Pfahringer, B., Bensusan, H., and Giraud-Carrier, C. (2000). Meta-learning by landmarking various learning algorithms. In In Proceedings of the Seventeenth International Conference on Machine Learning, pages 743- 750. Morgan Kaufmann.
  14. Reif, M., Shafait, F., and Dengel, A. (2011). Prediction of classifier training time including parameter optimization. In 34th Annual German Conference on Artificial Intelligence KI11, Berlin, Germany.
  15. Segrera, S., Pinho, J., and Moreno, M. (2008). Informationtheoretic measures for meta-learning. In Corchado, E., Abraham, A., and Pedrycz, W., editors, Hybrid Artificial Intelligence Systems, volume 5271 of Lecture Notes in Computer Science, pages 458-465. Springer Berlin / Heidelberg.
  16. Simonoff, J. S. (2003). Analyzing Categorical Data. Springer Texts in Statistics. Springer Berlin / Heidelberg.
  17. Soares, C. and Brazdil, P. B. (2006). Selecting parameters of SVM using meta-learning and kernel matrix-based meta-features. In SAC 7806: Proceedings of the 2006 ACM symposium on Applied computing, pages 564- 568, New York, NY, USA. ACM.
  18. Soares, C., Brazdil, P. B., and Kuba, P. (2004). A metalearning method to select the kernel width in support vector regression. Machine Learning, 54(3):195-209.
  19. Sohn, S. Y. (1999). Meta analysis of classification algorithms for pattern recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 21(11):1137 -1144.
  20. Vilalta, R., Giraud-carrier, C., Brazdil, P. B., and Soares, C. (2004). Using meta-learning to support data mining. International Journal of Computer Science and Applications, 1(1):31-45.
  21. Vlachos, P. (1998). StatLib datasets archive. http://lib.stat.cmu.edu Department of Statistics, Carnegie Mellon University.
Download


Paper Citation


in Harvard Style

Reif M. (2012). A COMPREHENSIVE DATASET FOR EVALUATING APPROACHES OF VARIOUS META-LEARNING TASKS . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8425-98-0, pages 273-276. DOI: 10.5220/0003736302730276


in Bibtex Style

@conference{icpram12,
author={Matthias Reif},
title={A COMPREHENSIVE DATASET FOR EVALUATING APPROACHES OF VARIOUS META-LEARNING TASKS},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2012},
pages={273-276},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003736302730276},
isbn={978-989-8425-98-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - A COMPREHENSIVE DATASET FOR EVALUATING APPROACHES OF VARIOUS META-LEARNING TASKS
SN - 978-989-8425-98-0
AU - Reif M.
PY - 2012
SP - 273
EP - 276
DO - 10.5220/0003736302730276