Representation Optimization with Feature Selection and Manifold Learning in a Holistic Classification Framework

Fabian Bürger, Josef Pauli

Abstract

Many complex and high dimensional real-world classification problems require a carefully chosen set of features, algorithms and hyperparameters to achieve the desired generalization performance. The choice of a suitable feature representation has a great effect on the prediction performance. Manifold learning techniques – like PCA, Isomap, Local Linear Embedding (LLE) or Autoencoders – are able to learn a better suitable representation automatically. However, the performance of a manifold learner heavily depends on the dataset. This paper presents a novel automatic optimization framework that incorporates multiple manifold learning algorithms in a holistic classification pipeline together with feature selection and multiple classifiers with arbitrary hyperparameters. The highly combinatorial optimization problem is solved efficiently using evolutionary algorithms. Additionally, a multi-pipeline classifier based on the optimization trajectory is presented. The evaluation on several datasets shows that the proposed framework outperforms the Auto-WEKA framework in terms of generalization and optimization speed in many cases.

References

  1. A°berg, M. and Wessberg, J. (2007). Evolutionary optimization of classifiers and features for single trial eeg discrimination. Biomedical engineering online, 6(1):32.
  2. Bache, K. and Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml/.
  3. Belkin, M. and Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems (NIPS), volume 14, pages 585-591.
  4. Bengio, Y. (2000). Gradient-based optimization of hyperparameters. Neural computation, 12(8):1889-1900.
  5. Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1798-1828.
  6. Bengio, Y., Paiement, J.-f., Vincent, P., Delalleau, O., Roux, N. L., and Ouimet, M. (2003). Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In Advances in Neural Information Processing Systems, page None.
  7. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B., et al. (2011). Algorithms for hyper-parameter optimization. In 25th Annual Conference on Neural Information Processing Systems (NIPS 2011).
  8. Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13(1):281-305.
  9. Beyer, H.-G. and Schwefel, H.-P. (2002). Evolution strategies - a comprehensive introduction. Natural Computing, 1(1):3-52.
  10. Bishop, C. M. and Nasrabadi, N. M. (2006). Pattern recognition and machine learning, volume 1. Springer New York.
  11. Brand, M. (2002). Charting a manifold. In Advances in neural information processing systems, pages 961-968. MIT Press.
  12. Bürger, F., Buck, C., Pauli, J., and Luther, W. (2014). Image-based object classification of defects in steel using data-driven machine learning optimization. In Braz, J. and Battiato, S., editors, Proceedings of International Conference on Computer Vision Theory and Applications (VISAPP), pages 143-152.
  13. Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M., and Stal, M. (1996). Pattern-Oriented Software Architecture Volume 1: A System of Patterns. Wiley.
  14. Donoho, D. L. and Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for highdimensional data. Proceedings of the National Academy of Sciences, 100(10):5591-5596.
  15. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179- 188.
  16. Fukumizu, K., Bach, F. R., and Jordan, M. I. (2004). Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. J. Mach. Learn. Res., 5:73-99.
  17. Globerson, A. and Roweis, S. T. (2005). Metric learning by collapsing classes. In Advances in neural information processing systems, pages 451-458.
  18. Goldberger, J., Roweis, S., Hinton, G., and Salakhutdinov, R. (2004). Neighbourhood components analysis. In Advances in Neural Information Processing Systems 17.
  19. He, X., Cai, D., Yan, S., and Zhang, H.-J. (2005). Neighborhood preserving embedding. In Computer Vision (ICCV), 10th IEEE International Conference on, volume 2, pages 1208-1213.
  20. Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507.
  21. Huang, C.-L. and Wang, C.-J. (2006). A GA-based feature selection and parameters optimizationfor support vector machines. Expert Systems with Applications, 31(2):231 - 240.
  22. Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. (2006). Extreme learning machine: theory and applications. Neurocomputing, 70(1):489-501.
  23. Huang, H.-L. and Chang, F.-L. (2007). Esvm: Evolutionary support vector machine for automatic feature selection and classification of microarray data. Biosystems, 90(2):516 - 528.
  24. Jain, A. K., Duin, R. P. W., and Mao, J. (2000). Statistical pattern recognition: a review. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(1):4- 37.
  25. Kim, H., Howland, P., and Park, H. (2005). Dimension reduction in text classification with support vector machines. In Journal of Machine Learning Research, pages 37-53.
  26. Ma, Y. and Fu, Y. (2011). Manifold Learning Theory and Applications. CRC Press.
  27. Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., Le, Q. V., and Ng, A. Y. (2011). On optimization methods for deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 265-272.
  28. Niyogi, X. (2004). Locality preserving projections. In Neural information processing systems, volume 16, page 153.
  29. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559-572.
  30. Ranawana, R. and Palade, V. (2006). Multi-classifier systems: Review and a roadmap for developers. International Journal of Hybrid Intelligent Systems, 3(1):35- 61.
  31. Schölkopf, B., Smola, A., and Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299-1319.
  32. Spearman, C. (1904). “general intelligence”, objectively determined and measured. The American Journal of Psychology, 15(2):201-292.
  33. Tenenbaum, J. B., De Silva, V., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323.
  34. Thornton, C., Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2013). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proc. of KDD-2013, pages 847-855.
  35. Van der Maaten, L. (2014). Matlab Toolbox for Dimensionality Reduction. http://homepage.tudelft.nl/ 19j49/Matlab_Toolbox_for_Dimensionality_ Reduction.html.
  36. Van der Maaten, L., Postma, E., and Van Den Herik, H. (2009). Dimensionality reduction: A comparative review. Journal of Machine Learning Research, 10:1- 41.
  37. Weinberger, K. Q. and Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10:207- 244.
  38. Wolpert, D. H. (1996). The lack of a priori distinctions between learning algorithms. Neural computation, 8(7):1341-1390.
  39. Zhang, T., Yang, J., Zhao, D., and Ge, X. (2007). Linear local tangent space alignment and application to face recognition. Neurocomputing, 70(7):1547-1553.
Download


Paper Citation


in Harvard Style

Bürger F. and Pauli J. (2015). Representation Optimization with Feature Selection and Manifold Learning in a Holistic Classification Framework . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 35-44. DOI: 10.5220/0005183600350044


in Bibtex Style

@conference{icpram15,
author={Fabian Bürger and Josef Pauli},
title={Representation Optimization with Feature Selection and Manifold Learning in a Holistic Classification Framework},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={35-44},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005183600350044},
isbn={978-989-758-076-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Representation Optimization with Feature Selection and Manifold Learning in a Holistic Classification Framework
SN - 978-989-758-076-5
AU - Bürger F.
AU - Pauli J.
PY - 2015
SP - 35
EP - 44
DO - 10.5220/0005183600350044