CLASSIFICATION OF MASS SPECTROMETRY DATA - Using Manifold and Supervised Distance Metric Learning

Qingzhong Liu, Andrew H. Sung, Bernardete M. Ribeiro, Mengyu Qiao

Abstract

Mass spectrometry becomes the most widely used measurement in proteomics research. The quality of the feature set and applied learning classifier determine the reliability of the prediction of disease status. A well-known approach is to combine peak detection and support vector machine recursive feature elimination (SVMRFE). To compare the feature selection and to search for alternative learning classifier, in this paper, we employ a distance metric learning to classification of proteomics mass spectrometry (MS) data. Experimental results show that distance metric learning is promising for the classification of proteomics data; the results are comparable to the best results by applying SVM to the SVMRFE feature sets. Results also indicate that the good potential of manifold learning for feature reduction in MS data analysis.

References

  1. Petricoin, E. and Liotta, L. (2003), Mass spectrometrybased diagnostic: the upcoming revolution in disease detection. Clin. Chem., 49, pp.533-534.
  2. Williams, B., Cornett, S., Dawant, B., Crecelius, A., Bodenheimer, B. and Caprioli, R. (2005), An algorithm for baseline correction of MALDI mass spectra, Proceedings of the 43rd annual Southeast regional conference, March 18-20, 2005, Kennesaw, Georgia.
  3. Chen, S., Hong, D. and Shyr, Y. (2007), Wavelet-based procedures for proteomic mass spectrometry data processing, Computational Statistics & Data Analysis, 2007, Vol. 52, issue 1, pp.211-220.
  4. Li, L. et al. (2004), Applications of the GA/KNN method to SELDI proteomics data. Bioinformatics, 20, pp.1638-1640.
  5. Petricoin, E. et al. (2002), Use of proteomics patterns in serum to identify ovarian cancer. The Lancet, 359, pp.572-577.
  6. Coombes, K. et al. (2007), Pre-processing mass spectrometry data. In Dubitzky, M., et al. (eds.), Fundamentals of Data Mining in Genomics and Proteomics. Kluwer, Boston, pp.79-99.
  7. Hilario, M. et al. (2006), Processing and classification of protein mass spectra. Mass Spectrom. Rev., 25:409- 449.
  8. Shin, H. and Markey, M. (2006), A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples. J. Biomed. Inform. 39, pp.227-248.
  9. Furey, T. et al. (2000), Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16: 906-914.
  10. Coombes, K. et al. (2005), Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform, Proteomics, Volume 5, Issue 16.
  11. Duan, K. and Rajapakse, J.C. (2004), SVM-RFE peak selection for cancer classification with mass spectrometry data. APBC 2005: pp.191-200.
  12. Guyon, I., Weston, J., Barnhill, S. and Vapnik, V.N. (2002), Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning. 2002 46(1-3): pp.389-422.
  13. Vapnik,V.N. (1998), Statistical Learning Theory. John Wiley and Sons, New York.
  14. Brown, M.P.S. et al. (2000), Knowledge-based analysis of microarray gene expression data by using support vector machines. Pro. Nat Acad. Sci., 97, pp.262-267.
  15. Liu, Q., Sung, A.H., Chen, Z. and Xu, J. (2008), Feature Mining and Pattern Classification for Steganalysis of LSB Matching Steganography in Grayscale Images, Pattern Recognition, 41(1): pp.56-66.
  16. Tenenbaum, J., Silva, V. de and Langford, J. C. (2000), A global geometric framework for nonlinear dimensionality reduction, Science, vol. 290, pp.2319- 2323.
  17. Saul, L. K. and Roweis, S. T. (2003), Think globally, fit locally: Unsupervised learning of low dimensional manifolds, Journal of Machine Learning Research, vol. 4, pp.119-155.
  18. Belkin, M. and Niyogi, P. (2003), Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation, 15( 6):1373-1396.
  19. Xing, E., Ng, A., Jordan, M., and Russell, S. (2003), Distance metric learning with application to clustering with side-information, in Proc. NIPS, 2003.
  20. Domeniconi, C. and Gunopulos, D. (2002), Adaptive nearest neighbor classification using support vector machines, Proc. NIPS, 2002.
  21. Peng, J., Heisterkamp, D. and Dai, H. (2002), Adaptive kernel metric nearest neighbor classification, Proc. International Conference on Pattern Recognition, 2002.
  22. Goldberger, J., Roweis, S., Hinton, G. and Salakhutdinov, R. (2005), Neighbourhood components analysis, in Proc. NIPS, 2005.
  23. Zhang, Z., Kwok, J. and Yeung, D. (2003), Parametric distance metric learning with label information, in Proc. International Joint Conference on Artificial Intelligence, 2003.
  24. Zhang, K., Tang, M. and Kwok, J. T. (2005), Applying neighborhood consistency for fast clustering and kernel density estimation. in Proc. Computer Vision and Pattern Recognition, 2005, pp. 1001-1007 Chopra, S., Hadsell, R. and. LeCun Y. (2005), Learning a Similarity Metric Discriminatively, with Application to Face Verification, Proc. Computer Vision and Pattern Recognition, 2005, Vol. 1, pp.539-546.
  25. Weinberger, K., Blitzer, J. and Saul, L. (2006), Distance metric learning for large margin nearest neighbor classification, in Proc. NIPS, 2006, pp.1475-1482.
  26. Pusztai et al. (2004), Pharmacoproteomic Analysis of Prechemotherapy and Postchemotherapy Plasma Samples from Patients Receiving Neoadjuvant or Adjuvant Chemotherapy for Breast Carcinoma, Cancer 100: pp.1814-1822.
  27. Vandenberghe, L. and Boyd, S.P. (1996), Semidefinite programming, SIAM Review, 38(1): 49-95.
  28. Roweis, S. T. and Lawrance, K. S. (2000), Nonlinear dimensionality reduction by locally linear embedding, in Science, vol. 290, 2000, pp.2323-2326.
Download


Paper Citation


in Harvard Style

Liu Q., H. Sung A., M. Ribeiro B. and Qiao M. (2009). CLASSIFICATION OF MASS SPECTROMETRY DATA - Using Manifold and Supervised Distance Metric Learning . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009) ISBN 978-989-8111-65-4, pages 396-401. DOI: 10.5220/0001556403960401


in Bibtex Style

@conference{biosignals09,
author={Qingzhong Liu and Andrew H. Sung and Bernardete M. Ribeiro and Mengyu Qiao},
title={CLASSIFICATION OF MASS SPECTROMETRY DATA - Using Manifold and Supervised Distance Metric Learning },
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)},
year={2009},
pages={396-401},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001556403960401},
isbn={978-989-8111-65-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)
TI - CLASSIFICATION OF MASS SPECTROMETRY DATA - Using Manifold and Supervised Distance Metric Learning
SN - 978-989-8111-65-4
AU - Liu Q.
AU - H. Sung A.
AU - M. Ribeiro B.
AU - Qiao M.
PY - 2009
SP - 396
EP - 401
DO - 10.5220/0001556403960401