Improved Singular Value Decomposition for Supervised Learning in a High Dimensional Dataset

Ricco Rakotomalala, Faouzi Mhamdi

Abstract

Singular Value Decomposition (SVD) is a useful technique for dimensionality reduction with a controlled loss of information. This paper makes the very simple but worth-while observation that many attributes that contain no information about the class label, may thus be selected erroneously for a supervised learning task. We propose to first use a very tolerant filter to select on a univariate basis which attributes to include in the subsequent SVD. The features, “the latent variables”, extracted from relevant descriptors allow to build a better classifier with a significant improvement of the generalization error rate and less cpu time. We show the efficiency of this combination of feature selection and construction approaches on a protein classification context.

References

  1. Mhamdi, F., Elloumi, M., Rakotomalala, R.: Text-mining, feature selection and data-mining for proteins classification. In: Proceedings of International Conference on Information and Communication Technologies: From Theory to Applications, IEEE Press (2004) 457-458
  2. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer (2001)
  3. Wall, M., Rechtsteiner, A., Rocha, L.: Singular Value Decomposition and Principal Component Analysis. In: A Practical Approach to Microarray Data Analysis. Kluwer (2003) 91-109
  4. Husbands, P., Simon, H., Ding, C.: On the use of the singular value decomposition for text retrieval. In: Proceedings of 1st SIAM Computational Information Retrieval Workshop. (2000)
  5. Wold, S., Esbensen, K., Geladi, P.: Pricnipal component analysis. Chemometrics and Intelligent Laboratory Systems 2 (1987) 37-52
  6. Rakotomalala, R.: Tanagra: une plate-forme d'expérimentation pour la fouille de données. Revue MODULAD (2005) 70-85
  7. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3 (2003) 1157-1182
  8. Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparison of feature ranking methods based on information entropy. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), IEEE Press (2004) 1415-1420
  9. Murzin, A., Brenner, S., Hubbard, T., Chothia, C.: Scop: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology (1995) 536-540
  10. Efron, B., Tibshirani, R.: Improvements on cross-validation: The 0.632+ bootstrap method. JASA 92 (1997) 548-560
  11. Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: KDD 7804: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, ACM Press (2004) 737-742
Download


Paper Citation


in Harvard Style

Rakotomalala R. and Mhamdi F. (2006). Improved Singular Value Decomposition for Supervised Learning in a High Dimensional Dataset . In 6th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2006) ISBN 978-972-8865-55-9, pages 38-47. DOI: 10.5220/0002472600380047


in Bibtex Style

@conference{pris06,
author={Ricco Rakotomalala and Faouzi Mhamdi},
title={Improved Singular Value Decomposition for Supervised Learning in a High Dimensional Dataset},
booktitle={6th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2006)},
year={2006},
pages={38-47},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002472600380047},
isbn={978-972-8865-55-9},
}


in EndNote Style

TY - CONF
JO - 6th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2006)
TI - Improved Singular Value Decomposition for Supervised Learning in a High Dimensional Dataset
SN - 978-972-8865-55-9
AU - Rakotomalala R.
AU - Mhamdi F.
PY - 2006
SP - 38
EP - 47
DO - 10.5220/0002472600380047