Gene Selection using a Hybrid Memetic and Nearest Shrunken Centroid Algorithm

Vinh Quoc Dang, Chiou-Peng Lam


High-throughput technologies such as microarrays and mass spectrometry produced high dimensional biological datasets both in abundance and with increasing complexity. Prediction Analysis for Microarrays (PAM) is a well-known implementation of the Nearest Shrunken Centroid (NSC) method which has been widely used for classification of biological data. In this paper, a hybrid approach incorporating the Nearest Shrunken Centroid (NSC) and Memetic Algorithm (MA) is proposed to automatically search for an optimal range of shrinkage threshold values for the NSC to improve feature selection and classification accuracy. Evaluation of the approach involved nine biological datasets and results showed improved feature selection stability over existing evolutionary approaches as well as improved classification accuracy.


  1. Alizadeh, A., Eisen, B., Davis, E., Ma, C., Lossos, I. S., Rosenwald, A., Yu, X. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403(6769), 503-511.
  2. Alon, Uri, Barkai, Naama, Notterman, Daniel A, Gish, Kurt, Ybarra, Suzanne, Mack, Daniel, & Levine, Arnold J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), 6745-6750.
  3. Baggiolini, Marco, Walz, A, & Kunkel, SL. (1989). Neutrophil-activating peptide-1/interleukin 8, a novel cytokine that activates neutrophils. Journal of Clinical Investigation, 84(4), 1045.
  4. Chin, A., Mirzal, A., Haron, H., & Hamed, H. (2015). Supervised, Unsupervised and Semi-supervised Feature selection: A Review on Gene Selection.
  5. Dang, V. (2014). Evolutionary approaches for feature selection in biological data. (PhD), Edith Cowan University, Australia.
  6. Dang, V., Lam, C., & Lee, C. (2013). NSC-GA: Search for optimal shrinkage thresholds for nearest shrunken centroid. Paper presented at the Proceedings IEEE sympodium series on computatinal intelligence, Singapore.
  7. Eiben, A. E., & Smith, J. E. (2007). Introduction to evolutionary computing. Berlin Heidelberg: Springer.
  8. Elbeltagi, Emad, Hegazy, Tarek, & Grierson, Donald. (2005). Comparison among five evolutionary-based optimization algorithms. Advanced Engineering Informatics, 19(1), 43-53.
  9. Foss, Andrew. (2011). High-dimensional Data Mining: Subspace Clustering, Outlier Detection and Applications to Classification: VDM Publishing.
  10. GeneCards.). LY6D Gene. Retrieved 10 December, 2015, from
  11. Golub, Todd R, Slonim, Donna K, Tamayo, Pablo, Huard, Christine, Gaasenbeek, Michelle, Mesirov, Jill P, Caligiuri, Mark A. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531-537.
  12. Gordon, Gavin J, Jensen, Roderick V, Hsiao, Li-Li, Gullans, Steven R, Blumenstock, Joshua E, Ramaswamy, Sridhar, . . . Bueno, Raphael. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 62(17), 4963-4967.
  13. Hilario, M., & Kalousis, A. (2008). Approaches to dimensionality reduction in proteomic biomarker studies. Briefings in Bioinformatics, 9(2), 102-118.
  14. Huang, Liang-Tsung. (2009). An integrated method for cancer classification and rule extraction from microarray data. J Biomed Sci, 16(1), 25.
  15. Kim, Gilhan, Kim, Yeonjoo, Lim, Heuiseok, & Kim, Hyeoncheol. (2010). An MLP-based feature subset selection for HIV-1 protease cleavage site analysis. Artificial Intelligence in Medicine, 48(2-3), 83-89.
  16. Klassen, M., & Kim, N. (2009). Nearest shrunken centroid as feature selection for microarray data. Paper presented at the ICATA (Computers and Their Applications).
  17. Krasnogor, Natalio, & Smith, Jim. (2005). A tutorial for competent memetic algorithms: model, taxonomy, and design issues. Evolutionary Computation, IEEE Transactions on, 9(5), 474-488.
  18. Lourenço, Helena R, Martin, Olivier C, Stützle, Thomas, Glover, Ed F, & Kochenberger, G. (2001). Iterated Local Search. arXiv preprint math.OC/0102188.
  19. Lusa, Lara. (2012). Impact of class-imbalance on multiclass high-dimensional class prediction. Metodoloski zvezki, 9(1), 25.
  20. Masys, Daniel R, Welsh, John B, Fink, J Lynn, Gribskov, Michael, Klacansky, Igor, & Corbeil, Jacques. (2001). Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics, 17(4), 319-326.
  21. Moscato, Pablo. (1989). On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. Caltech concurrent computation program, C3P Report, 826, 1989.
  22. Petricoin, EF, Ardekani, AM, Hitt, BA, Levine, PJ, Fusaro, VA, Steinberg, SM, Liotta, LA. (2002). Use of proteomic patterns in serum to identify ovarian cancer. The Lancet, 359(9306), 572 - 577.
  23. Pomeroy, SL, Tamayo, P, Gaasenbeek, M, Sturla, LM, Angelo, M, McLaughlin, ME, Golub, TR. (2002). Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415, 436 - 442.
  24. Ray, Sandip, Britschgi, Markus, Herbert, Charles, TakedaUchimura, Yoshiko, Boxer, Adam, Blennow, Kaj, Karydas, Anna. (2007). Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins. Nature medicine, 13(11), 1359- 1362.
  25. Singh, Dinesh, Febbo, Phillip G, Ross, Kenneth, Jackson, Donald G, Manola, Judith, Ladd, Christine, Richie, Jerome P. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2), 203-209.
  26. Soufan, O, Kleftogiannis, D, Kalnis, P, & Bajic, VB. (2015). DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm. PLoS ONE, 10(2), e0117988.
  27. Tai, F., & Pan, W. (2007). Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data. Bioinformatics, 23(23), 3170-3177.
  28. The Human Protein Atlas.). ESR1. Retrieved 10 December, 2015, from ENSG00000091831-ESR1/gene.
  29. Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA, 99(10), 6567 - 6572.
  30. Tong, Dong L, Phalp, Keith T, Schierz, Amanda C, & Mintram, Robert. (2009). Innovative hybridisation of genetic algorithms and neural networks in detecting marker genes for leukaemia cancer. Paper presented at the 4th IAPR International Conference in Pattern Recognition for Bioinformatics, Sheffield, UK.
  31. van't Veer, Laura J, Dai, Hongyue, Van De Vijver, Marc J, He, Yudong D, Hart, Augustinus AM, Mao, Mao, . . . Witteveen, Anke T. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415(6871), 530-536.
  32. Wang, S., & Zhu, J. (2007). Improved centroids estimation for the nearest shrunken centroid classifier. Bioinformatics, 23(8), 972-979.
  33. Wu, Fengjie. (2001). A Framework for Memetic Algorithms. (Master of Science in Computer Science), University of Auckland, Auckland.
  34. Yap, E., Tan, H., & Pang, H. (2007). Learning causal models for noisy biological data mining: An application to ovarian cancer detection. Paper presented at the AAAI.
  35. Yu, L., & Liu, H. (2004). Redundancy based feature selection for microarray data. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.
  36. Zhu, Z., Ong, Y., & Dash, M. (2007a). Markov blanketembedded genetic algorithm for gene selection. Pattern Recognition, 40(11), 3236-3248.
  37. Zhu, Z., Ong, Y., & Dash, M. (2007b). Wrapper-filter feature selection algorithm using a memetic framework. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 37(1), 70-76.

Paper Citation

in Harvard Style

Dang V. and Lam C. (2016). Gene Selection using a Hybrid Memetic and Nearest Shrunken Centroid Algorithm . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 190-197. DOI: 10.5220/0005665201900197

in Bibtex Style

author={Vinh Quoc Dang and Chiou-Peng Lam},
title={Gene Selection using a Hybrid Memetic and Nearest Shrunken Centroid Algorithm},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)},

in EndNote Style

JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)
TI - Gene Selection using a Hybrid Memetic and Nearest Shrunken Centroid Algorithm
SN - 978-989-758-170-0
AU - Dang V.
AU - Lam C.
PY - 2016
SP - 190
EP - 197
DO - 10.5220/0005665201900197