ON THE GRADIENT-BASED ALGORITHM FOR MATRIX FACTORIZATION APPLIED TO DIMENSIONALITY REDUCTION

Vladimir Nikulin, Geoffrey J. McLachlan

Abstract

The high dimensionality of microarray data, the expressions of thousands of genes in a much smaller number of samples, presents challenges that affect the applicability of the analytical results. In principle, it would be better to describe the data in terms of a small number of metagenes, derived as a result of matrix factorisation, which could reduce noise while still capturing the essential features of the data. We propose a fast and general method for matrix factorization which is based on decomposition by parts that can reduce the dimension of expression data from thousands of genes to several factors. Unlike classification and regression, matrix decomposition requires no response variable and thus falls into category of unsupervised learning methods. We demonstrate the effectiveness of this approach to the supervised classification of gene expression data.

References

  1. Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., Tran, T., and Yu, X. (2000). Distinct types of diffuse large bcell-lymphoma identified by gene expression profiling. Nature, 403:503-511.
  2. Ambroise, C. and McLachlan, G. (2002). Selection bias in gene extraction on the basis of microarray gene expression data. Proceedings of the National Academy of Sciences USA, 99:6562-6566.
  3. Bohning, D. (1992). Multinomial logistic regression algorithm. Ann. Inst. Statist. Math., 44(1):197-200.
  4. Brunet, J., Tamayo, P., Golub, T., and Mesirov, J. (2004). Metagenes and molecular pattern discovery using matrix factorisation. Proceedings of the National Academy of Sciences USA, 101(12):4164-4169.
  5. Dettling, M. and Buhlmann, P. (2003). Boosting for tumor classification with gene expression data. Bioinformatics, 19(9):1061-1069.
  6. Dudoit, S., Fridlyand, J., and Speed, I. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of Americal Statistical Association, 97(457):77-87.
  7. Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46:389-422.
  8. Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., and Schwab, M. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7(6):673-679.
  9. Koren, Y. (2009). Collaborative filtering with temporal dynamics. In KDD, pages 447-455.
  10. Lee, D. and Seung, H. (2000). Algorithms for non-negative matrix factorisation. In Advances in Neural Information Processing Systems.
  11. Mol, C., Mosci, S., Traskine, M., and Verri, A. (2009). A regularised method for selecting nested groups of relevant genes from microarray data. Journal of Computational Biology, 16(5):677-690.
  12. Peng, H. and Ding, C. (2005). Minimum redundancy and maximum relevance feature selection and recent advances in cancer classification. In SIAM workshop on feature selection for data mining, pages 52-59.
  13. Peng, Y. (2006). A novel ensemble machine learning for robust microarray data classification. Computers in Biology and Medicine, 36:553-573.
  14. Tamayo, P., Scanfeld, D., Ebert, B., Gillette, M., Roberts, C., and Mesirov, J. (2007). Metagene projection for cross-platform, cross-species characterization of global transcriptional states. Proceedings of the National Academy of Sciences USA, 104(14):5959-5964.
  15. Zhang, X., Lu, X., Shi, Q., Xu, X., Leung, H., Harris, L., Iglehart, J., Miron, A., and Wong, W. (2006). Recursive svm feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 7(197).
Download


Paper Citation


in Harvard Style

Nikulin V. and J. McLachlan G. (2010). ON THE GRADIENT-BASED ALGORITHM FOR MATRIX FACTORIZATION APPLIED TO DIMENSIONALITY REDUCTION . In Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010) ISBN 978-989-674-019-1, pages 147-152. DOI: 10.5220/0002736601470152


in Bibtex Style

@conference{bioinformatics10,
author={Vladimir Nikulin and Geoffrey J. McLachlan},
title={ON THE GRADIENT-BASED ALGORITHM FOR MATRIX FACTORIZATION APPLIED TO DIMENSIONALITY REDUCTION},
booktitle={Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)},
year={2010},
pages={147-152},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002736601470152},
isbn={978-989-674-019-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)
TI - ON THE GRADIENT-BASED ALGORITHM FOR MATRIX FACTORIZATION APPLIED TO DIMENSIONALITY REDUCTION
SN - 978-989-674-019-1
AU - Nikulin V.
AU - J. McLachlan G.
PY - 2010
SP - 147
EP - 152
DO - 10.5220/0002736601470152