method is said to follow a filter approach, and if the
criterion depends on the rule, the method is said to
follow a wrapper approach (Ambroise and McLach-
lan, 2002). The objective of this study is to develop
a filtering machine learning approach and produce a
robust classification for microarray data.
Based on our experiments, the proposed matrix
factorisation performed an effective dimensional re-
duction as a preparation step for the following su-
pervised classification. Classifiers built in metagene,
rather than original gene, space are more robust and
reproducible because the projection can reduce noise
more than simple normalisation. Algorithm 1, as a
main contribution of this paper, is conceptually sim-
ple. Consequently, it is much faster compared to pop-
ular NMF. Stability of the algorithm depends essen-
tially on the properly selected learning rate, which
must not be too big. We can include additional func-
tions so that the learning rate will be reduced or in-
creased depending on the current performance.
There are many advantages to such a metagene
approach. By capturing the major, invariant biolog-
ical features and reducing noise, metagenes provide
descriptions of data sets that allow them to be more
easily combined and compared. In addition, interpre-
tation of the metagenes, which characterize a subtype
or subset of samples, can give us insight into underly-
ing mechanisms and processes of a disease.
The results that we obtained on three real datasets
confirm the potential of our approach.
REFERENCES
Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I.,
Rosenwald, A., Boldrick, J., Sabet, H., Tran, T., and
Yu, X. (2000). Distinct types of diffuse large b-
cell-lymphoma identified by gene expression profil-
ing. Nature, 403:503–511.
Ambroise, C. and McLachlan, G. (2002). Selection bias in
gene extraction on the basis of microarray gene ex-
pression data. Proceedings of the National Academy
of Sciences USA, 99:6562–6566.
Bohning, D. (1992). Multinomial logistic regression algo-
rithm. Ann. Inst. Statist. Math., 44(1):197–200.
Brunet, J., Tamayo, P., Golub, T., and Mesirov, J. (2004).
Metagenes and molecular pattern discovery using
matrix factorisation. Proceedings of the National
Academy of Sciences USA, 101(12):4164–4169.
Dettling, M. and Buhlmann, P. (2003). Boosting for tumor
classification with gene expression data. Bioinformat-
ics, 19(9):1061–1069.
Dudoit, S., Fridlyand, J., and Speed, I. (2002). Comparison
of discrimination methods for the classification of tu-
mors using gene expression data. Journal of Americal
Statistical Association, 97(457):77–87.
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002).
Gene selection for cancer classification using support
vector machines. Machine Learning, 46:389–422.
Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., West-
ermann, F., Berthold, F., and Schwab, M. (2001).
Classification and diagnostic prediction of cancers us-
ing gene expression profiling and artificial neural net-
works. Nature Medicine, 7(6):673–679.
Koren, Y. (2009). Collaborative filtering with temporal dy-
namics. In KDD, pages 447–455.
Lee, D. and Seung, H. (2000). Algorithms for non-negative
matrix factorisation. In Advances in Neural Informa-
tion Processing Systems.
Mol, C., Mosci, S., Traskine, M., and Verri, A. (2009). A
regularised method for selecting nested groups of rel-
evant genes from microarray data. Journal of Compu-
tational Biology, 16(5):677–690.
Peng, H. and Ding, C. (2005). Minimum redundancy and
maximum relevance feature selection and recent ad-
vances in cancer classification. In SIAM workshop on
feature selection for data mining, pages 52–59.
Peng, Y. (2006). A novel ensemble machine learning for
robust microarray data classification. Computers in
Biology and Medicine, 36:553–573.
Tamayo, P., Scanfeld, D., Ebert, B., Gillette, M., Roberts,
C., and Mesirov, J. (2007). Metagene projection
for cross-platform, cross-species characterization of
global transcriptional states. Proceedings of the Na-
tional Academy of Sciences USA, 104(14):5959–5964.
Zhang, X., Lu, X., Shi, Q., Xu, X., Leung, H., Harris, L.,
Iglehart, J., Miron, A., and Wong, W. (2006). Re-
cursive svm feature selection and sample classifica-
tion for mass-spectrometry and microarray data. BMC
Bioinformatics, 7(197).
BIOINFORMATICS 2010 - International Conference on Bioinformatics
152