FEATURE DISCRETIZATION AND SELECTION IN MICROARRAY DATA

Artur Ferreira, Mário Figueiredo

2011

Abstract

Tumor and cancer detection from microarray data are important bioinformatics problems. These problems are quite challenging for machine learning methods, since microarray datasets typically have a very large number of features and small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. This paper proposes unsupervised feature discretization and selection methods suited for microarray data. The experimental results reported, conducted on public domain microarray datasets, show that the proposed discretization and selection techniques yield competitive and promising results with the best previous approaches. Moreover, the proposed methods efficiently handle multi-class microarray data.

References

  1. Bolon-Canedo, V., Seth, S., Sanchez-Marono, N., AlonsoBetanzos, A., and Principe, J. (2011). Statistical dependence measure for feature selection in microarray datasets. In 19th Europ. Symp. on Art. Neural Networks-ESANN2011, pages 23-28, Belgium.
  2. Dougherty, J., Kohavi, R., and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In International Conference Machine Learning - ICML'95, pages 194-202. Morgan Kaufmann.
  3. Escolano, F., Suau, P., and Bonev, B. (2009). Information Theory in Computer Vision and Pattern Recognition. Springer.
  4. Ferreira, A. and Figueiredo, M. (2011). Unsupervised joint feature discretization and selection. In 5th Iberian Conference on Pattern Recognition and Image Analysis - IbPRIA2011, pages LNCS 6669, 200-207, Las Palmas de Gran Canaria, Spain.
  5. Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182.
  6. Guyon, I., Gunn, S., Nikravesh, M., and Zadeh (Editors), L. (2006). Feature Extraction, Foundations and Applications. Springer.
  7. Guyon, I., Weston, J., and Barnhill, S. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46:389-422.
  8. Meyer, P., Schretter, C., and Bontempi, G. (2008). Information-theoretic feature selection in microarray data using variable complementarity. IEEE Journal of Selected Topics in Signal Processing (Special Issue on Genomic and Proteomic Signal Processing), 2(3):261-274.
  9. Peng, H., Long, F., and Ding, C. (2005). Feature selection based on mutual information: Criteria of maxdependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1226-1238.
  10. Saeys, Y., Inza, I., and Larran˜aga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19):2507-2517.
  11. Statnikov, A., Aliferis, C., Tsamardinos, I., Hardin, D., and Levy, S. (2005). A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 21(5):631-643.
  12. Witten, I. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Morgan Kauffmann, 2nd edition.
Download


Paper Citation


in Harvard Style

Ferreira A. and Figueiredo M. (2011). FEATURE DISCRETIZATION AND SELECTION IN MICROARRAY DATA . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 457-461. DOI: 10.5220/0003662004650469


in Bibtex Style

@conference{kdir11,
author={Artur Ferreira and Mário Figueiredo},
title={FEATURE DISCRETIZATION AND SELECTION IN MICROARRAY DATA},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={457-461},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003662004650469},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - FEATURE DISCRETIZATION AND SELECTION IN MICROARRAY DATA
SN - 978-989-8425-79-9
AU - Ferreira A.
AU - Figueiredo M.
PY - 2011
SP - 457
EP - 461
DO - 10.5220/0003662004650469