Representation Optimization with Feature Selection and Manifold Learning in a Holistic Classification Framework
Fabian Bürger, Josef Pauli
2015
Abstract
Many complex and high dimensional real-world classification problems require a carefully chosen set of features, algorithms and hyperparameters to achieve the desired generalization performance. The choice of a suitable feature representation has a great effect on the prediction performance. Manifold learning techniques – like PCA, Isomap, Local Linear Embedding (LLE) or Autoencoders – are able to learn a better suitable representation automatically. However, the performance of a manifold learner heavily depends on the dataset. This paper presents a novel automatic optimization framework that incorporates multiple manifold learning algorithms in a holistic classification pipeline together with feature selection and multiple classifiers with arbitrary hyperparameters. The highly combinatorial optimization problem is solved efficiently using evolutionary algorithms. Additionally, a multi-pipeline classifier based on the optimization trajectory is presented. The evaluation on several datasets shows that the proposed framework outperforms the Auto-WEKA framework in terms of generalization and optimization speed in many cases.
References
- A°berg, M. and Wessberg, J. (2007). Evolutionary optimization of classifiers and features for single trial eeg discrimination. Biomedical engineering online, 6(1):32.
- Bache, K. and Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml/.
- Belkin, M. and Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems (NIPS), volume 14, pages 585-591.
- Bengio, Y. (2000). Gradient-based optimization of hyperparameters. Neural computation, 12(8):1889-1900.
- Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1798-1828.
- Bengio, Y., Paiement, J.-f., Vincent, P., Delalleau, O., Roux, N. L., and Ouimet, M. (2003). Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In Advances in Neural Information Processing Systems, page None.
- Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B., et al. (2011). Algorithms for hyper-parameter optimization. In 25th Annual Conference on Neural Information Processing Systems (NIPS 2011).
- Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13(1):281-305.
- Beyer, H.-G. and Schwefel, H.-P. (2002). Evolution strategies - a comprehensive introduction. Natural Computing, 1(1):3-52.
- Bishop, C. M. and Nasrabadi, N. M. (2006). Pattern recognition and machine learning, volume 1. Springer New York.
- Brand, M. (2002). Charting a manifold. In Advances in neural information processing systems, pages 961-968. MIT Press.
- Bürger, F., Buck, C., Pauli, J., and Luther, W. (2014). Image-based object classification of defects in steel using data-driven machine learning optimization. In Braz, J. and Battiato, S., editors, Proceedings of International Conference on Computer Vision Theory and Applications (VISAPP), pages 143-152.
- Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M., and Stal, M. (1996). Pattern-Oriented Software Architecture Volume 1: A System of Patterns. Wiley.
- Donoho, D. L. and Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for highdimensional data. Proceedings of the National Academy of Sciences, 100(10):5591-5596.
- Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179- 188.
- Fukumizu, K., Bach, F. R., and Jordan, M. I. (2004). Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. J. Mach. Learn. Res., 5:73-99.
- Globerson, A. and Roweis, S. T. (2005). Metric learning by collapsing classes. In Advances in neural information processing systems, pages 451-458.
- Goldberger, J., Roweis, S., Hinton, G., and Salakhutdinov, R. (2004). Neighbourhood components analysis. In Advances in Neural Information Processing Systems 17.
- He, X., Cai, D., Yan, S., and Zhang, H.-J. (2005). Neighborhood preserving embedding. In Computer Vision (ICCV), 10th IEEE International Conference on, volume 2, pages 1208-1213.
- Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507.
- Huang, C.-L. and Wang, C.-J. (2006). A GA-based feature selection and parameters optimizationfor support vector machines. Expert Systems with Applications, 31(2):231 - 240.
- Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. (2006). Extreme learning machine: theory and applications. Neurocomputing, 70(1):489-501.
- Huang, H.-L. and Chang, F.-L. (2007). Esvm: Evolutionary support vector machine for automatic feature selection and classification of microarray data. Biosystems, 90(2):516 - 528.
- Jain, A. K., Duin, R. P. W., and Mao, J. (2000). Statistical pattern recognition: a review. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(1):4- 37.
- Kim, H., Howland, P., and Park, H. (2005). Dimension reduction in text classification with support vector machines. In Journal of Machine Learning Research, pages 37-53.
- Ma, Y. and Fu, Y. (2011). Manifold Learning Theory and Applications. CRC Press.
- Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., Le, Q. V., and Ng, A. Y. (2011). On optimization methods for deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 265-272.
- Niyogi, X. (2004). Locality preserving projections. In Neural information processing systems, volume 16, page 153.
- Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559-572.
- Ranawana, R. and Palade, V. (2006). Multi-classifier systems: Review and a roadmap for developers. International Journal of Hybrid Intelligent Systems, 3(1):35- 61.
- Schölkopf, B., Smola, A., and Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299-1319.
- Spearman, C. (1904). “general intelligence”, objectively determined and measured. The American Journal of Psychology, 15(2):201-292.
- Tenenbaum, J. B., De Silva, V., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323.
- Thornton, C., Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2013). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proc. of KDD-2013, pages 847-855.
- Van der Maaten, L. (2014). Matlab Toolbox for Dimensionality Reduction. http://homepage.tudelft.nl/ 19j49/Matlab_Toolbox_for_Dimensionality_ Reduction.html.
- Van der Maaten, L., Postma, E., and Van Den Herik, H. (2009). Dimensionality reduction: A comparative review. Journal of Machine Learning Research, 10:1- 41.
- Weinberger, K. Q. and Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10:207- 244.
- Wolpert, D. H. (1996). The lack of a priori distinctions between learning algorithms. Neural computation, 8(7):1341-1390.
- Zhang, T., Yang, J., Zhao, D., and Ge, X. (2007). Linear local tangent space alignment and application to face recognition. Neurocomputing, 70(7):1547-1553.
Paper Citation
in Harvard Style
Bürger F. and Pauli J. (2015). Representation Optimization with Feature Selection and Manifold Learning in a Holistic Classification Framework . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 35-44. DOI: 10.5220/0005183600350044
in Bibtex Style
@conference{icpram15,
author={Fabian Bürger and Josef Pauli},
title={Representation Optimization with Feature Selection and Manifold Learning in a Holistic Classification Framework},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={35-44},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005183600350044},
isbn={978-989-758-076-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Representation Optimization with Feature Selection and Manifold Learning in a Holistic Classification Framework
SN - 978-989-758-076-5
AU - Bürger F.
AU - Pauli J.
PY - 2015
SP - 35
EP - 44
DO - 10.5220/0005183600350044