Understanding the Interplay of Simultaneous Model Selection and Representation Optimization for Classification Tasks

Fabian Bürger, Josef Pauli

Abstract

The development of classification systems that meet the desired accuracy levels for real world-tasks applications requires a lot of expertise. Numerous challenges, like noisy feature data, suboptimal algorithms and hyperparameters, degrade the generalization performance. On the other hand, almost countless solutions have been developed, e.g. feature selection, feature preprocessing, automatic algorithm and hyperparameter selection. Furthermore, representation learning is emerging to automatically learn better features. The challenge of finding a suitable and tuned algorithm combination for each learning task can be solved by automatic optimization frameworks. However, the more components are optimized simultaneously, the more complex their interplay becomes with respect to the generalization performance and optimization run time. This paper analyzes the interplay of the components in a holistic framework which optimizes the feature subset, feature preprocessing, representation learning, classifiers and all hyperparameters. The evaluation on a real-world dataset that suffers from the curse of dimensionality shows the potential benefits and risks of such holistic optimization frameworks.

References

  1. Ansótegui, C., Sellmann, M., and Tierney, K. (2009). A gender-based genetic algorithm for the automatic configuration of algorithms. In Gent, I., editor, Principles and Practice of Constraint Programming - CP 2009, volume 5732 of Lecture Notes in Computer Science, pages 142-157. Springer Berlin Heidelberg.
  2. Bäck, T. (1996). Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford University Press, Oxford, UK.
  3. Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1798-1828.
  4. Beyer, H.-G. and Schwefel, H.-P. (2002). Evolution strategies - a comprehensive introduction. Natural Computing, 1(1):3-52.
  5. Bishop, C. M. and Nasrabadi, N. M. (2006). Pattern recognition and machine learning, volume 1. Springer New York.
  6. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5-32.
  7. Bürger, F. and Pauli, J. (2015). Representation optimization with feature selection and manifold learning in a holistic classification framework. In De Marsico, M. and Fred, A., editors, ICPRAM 2015 - Proceedings of the International Conference on Pattern Recognition Applications and Methods, volume 1, pages 35-44, Lisbon, Portugal. INSTICC, SCITEPRESS.
  8. Genuer, R., Poggi, J.-M., and Tuleau-Malot, C. (2010). Variable selection using random forests. Pattern Recognition Letters, 31(14):2225 - 2236.
  9. Howell, D. C. (2006). Statistical Methods for Psychology. Wadsworth Publishing.
  10. Hu, M.-K. (1962). Visual pattern recognition by moment invariants. Information Theory, IRE Transactions on, 8(2):179-187.
  11. Huang, H.-L. and Chang, F.-L. (2007). Esvm: Evolutionary support vector machine for automatic feature selection and classification of microarray data. Biosystems, 90(2):516 - 528.
  12. Hutter, F., Hoos, H., and Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. In Coello, C., editor, Learning and Intelligent Optimization, volume 6683 of Lecture Notes in Computer Science, pages 507-523. Springer Berlin Heidelberg.
  13. Jain, A. K., Duin, R. P. W., and Mao, J. (2000). Statistical pattern recognition: a review. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(1):4- 37.
  14. Juszczak, P., Tax, D., and Duin, R. (2002). Feature scaling in support vector data description. In Proc. ASCI, pages 95-102. Citeseer.
  15. Ma, Y. and Fu, Y. (2011). Manifold Learning Theory and Applications. CRC Press.
  16. Ojala, T., Pietikainen, M., and Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7):971-987.
  17. Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951-2959.
  18. Thornton, C., Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2013). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proc. of KDD-2013, pages 847-855.
  19. Van der Maaten, L., Postma, E., and Van Den Herik, H. (2009). Dimensionality reduction: A comparative review. Journal of Machine Learning Research, 10:1- 41.
Download


Paper Citation


in Harvard Style

Bürger F. and Pauli J. (2016). Understanding the Interplay of Simultaneous Model Selection and Representation Optimization for Classification Tasks . In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 283-290. DOI: 10.5220/0005705302830290


in Bibtex Style

@conference{icpram16,
author={Fabian Bürger and Josef Pauli},
title={Understanding the Interplay of Simultaneous Model Selection and Representation Optimization for Classification Tasks},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={283-290},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005705302830290},
isbn={978-989-758-173-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Understanding the Interplay of Simultaneous Model Selection and Representation Optimization for Classification Tasks
SN - 978-989-758-173-1
AU - Bürger F.
AU - Pauli J.
PY - 2016
SP - 283
EP - 290
DO - 10.5220/0005705302830290