Ensemble Method for Prediction of Prostate Cancer from RNA-Seq Data

Yongjun Piao, Nak Hyun Choi, Meijing Li, Minghao Piao, Keun Ho Ryu


The main idea of our research is to develop an ensemble machine learning algorithm to accurately classify prostate cancer using RNA-Seq data. To date, many studies have focused on predicting prostate cancer using microarray data. Recently, RNA-Seq is rapidly being used for cancer studies as an alternative for microarray. Thus, new machine learning algorithms are needed to analyze RNA-Seq data which have different characteristic compared with microarray. Currently the PhD research has been running for one year and has focused on analyzing existing state-of-art normalization methods, gene expression data analysis, and ensemble methods. Besides that, we designed an ensemble feature selection algorithm to select relevant genes from the gene expression data. Moreover, we have developed a 'digital' gene expression data simulator for evaluating the performance of proposed algorithms. The next step will be to construct an accurate ensemble prediction model to diagnosis of prostate cancer. Finally, the model will be fine-tuned based on the feedback from the medical doctors.


  1. Bullard, J., Purdom, E., Hansen, K., Dudoit, S., 2010. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11:94.
  2. Kim, Y., Yoon, H., Kim, J., Kang, H., Min, B., Kim, S., Ha, Y., Kim, I., Ryu, K., Lee, S., Kim, W., 2013. HOXA9, ISL1 and ALDH1A3 methylation patterns as prognostic markers for nonmuscle invasive bladder cancer: array-based DNA methylation and expression profiling. International Journal of Cancer 133, 1135- 1143.
  3. Metzker, M., 2010. Sequencing technologies - the next generation. Nature Reviews Genetics, 11, 31-46.
  4. Rapaport, F., Khanin, R., Liang, Y., Pirun, M., Krek, A., Zumbo, P., Mason, C., Socci, N., Betel, D., 2013. Comprehensive evaluation of differential gene expression analysis methods for RNA-Seq data. Genome Biology, 14:R95.
  5. Rahman, A., Verma, B., 2013, Ensemble Classifier Generation using Non-uniform Layered Clustering and Genetic Algorithm. Knowledge-Based System 43, 30-42.
  6. Tumer, K., Ghosh, J., 1996. Classier combining: analytical results and implications. Proc. Nat'l Conf. Artificial Intelligence, Portland, Ore, 126-132.
  7. Tumer, K., Oza, N., 1999. Decimated input ensembles for improved generalization. International Joint Conference on Neural Network 5, 3069-3074.
  8. Bryll, R., Gutierrez-Osuna, R., Quek, F., 2003. Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets, Pattern Recognition 36, 1291-1302.
  9. Rokach, L., 2006. Genetic algorithm-based feature set partitioning for classification problems, Pattern Recognition 41, 1676-1700.
  10. Rokach, L., 2010. Ensemble-based classifiers. Artif. Intell. Rev. 33, 1-39.
  11. Fujibuchi, W., Kato, T., 2007. Classification of heterogeneous microarray data by maximum entropy kernel. BMC Bioinformatics 8, 267-277.
  12. Cho, S., Ryu, J., 2002. Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features. Proceedings of the IEEE, 90(11), 1744-1753.
  13. Bashir, M., Lee, D., Li, M., Bae, J., Shon, H., Cho, M., Ryu, K., Trigger Learning and ECG Parameter Customization for Remote Cardiac Clinical Care Information System. IEE Transactions on Information Technology in Biomedicine, 16, 561-571.
  14. Cho, S., Won, H., 2007. Cancer classification using ensemble of neural networks with multiple significant gene subsets. Applied Intelligence 26, 243-250.
  15. Hsu, H., Hsieh, C., Lu, M., 2011. Hybrid feature selection by combining filters and wrappers. Expert System with Applications 38, 8144-8150.
  16. Dettling, M., Buhlmann, P., 2003. Boosting for tumor classification with gene expression data. Bioinformatics 19 (9), 1061-1069.
  17. Lee, J., Lee, J., Park, M., Song, S., 2005. An extensive comparison of recent classification tools applied to microarray data, Comput. Statist. Data Anal. 48, 77-87.
  18. Liu, H., Liu, L., Zhang, H., 2010. Ensemble gene selection for cancer classification. Pattern Recognition 43, 2763-2772.
  19. Lee, D., Ryu, K.S., Bashir, M., Bae, J., Ryu, K., Discovering Medical Knowledge using Association Rule Mining in Young Adults with Acute Myocardial Infarction. Journal of Medical Systems, 37.
  20. Kannan, S., Ramaraj. N., 2010. A novel hybrid feature selection via symmetrical uncertainty ranking based local memetric search algorithm. Knowledge-Based Systems 23, 580-585.
  21. Tan, A., Gilbert, D., 2003. Ensemble machine learning on gene expression data for cancer classification. Bioinformatics 20, 3583-3593.
  22. Yeh, J., 2008. Applying data mining techniques for cancer classification on gene expression data. Cybernetics and Systems: An International Journal 39, 583-602.
  23. Yang, K., Cai, Z., Li, J., Lin, G., 2006. A stable gene selection in microarray data analysis. BMC Bioinformatics 7:228.
  24. Yang, C., Chuang, L., Yang, C., 2009. IG-GA: A hybrid filter/wrapper method for feature selection of microarray data, Journal of Medical and Biological Engineering 30 (1), 23-28.
  25. Breiman, L., 1996. Bagging predictors, Machine Learning 24, 123-140.
  26. Bauer, E., Kohavi, R., 1999. An empirical comparison of voting classification algorithms: bagging, boosting and variants, Appears in Machine Learning 36, 105-139.
  27. Freund, Y., Schapire, R., 1996. Experiments with a new boosting algorithm, International Conference on Machine Learning, 148-156.
  28. Soneson, C., Delorenzi, M., 2013. A comparison of methods for differential expression analysis of RNASeq data. BMC Bioinformatics, 14:91.
  29. Piao, Y., Piao, M., Park, K., Ryu, K., 2012. An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data, Bioinformatics, 28, 3306-3315.
  30. Shon, H., Kuk, H., Whan, B., Ah, KIM., Lee, J., Ryu, K., N-terminal pro-B-type natriuretic peptide as prognostic marker for patients of non ST-segment elevation myocardial infarction. J. Cent. South Univ., 20, 2226-2232

Paper Citation

in Harvard Style

Piao Y., Choi N., Li M., Piao M. and Ryu K. (2014). Ensemble Method for Prediction of Prostate Cancer from RNA-Seq Data . In Doctoral Consortium - DC3K, (IC3K 2014) ISBN Not Available, pages 51-56. DOI: 10.5220/0005173700510056

in Bibtex Style

author={Yongjun Piao and Nak Hyun Choi and Meijing Li and Minghao Piao and Keun Ho Ryu},
title={Ensemble Method for Prediction of Prostate Cancer from RNA-Seq Data},
booktitle={Doctoral Consortium - DC3K, (IC3K 2014)},
isbn={Not Available},

in EndNote Style

JO - Doctoral Consortium - DC3K, (IC3K 2014)
TI - Ensemble Method for Prediction of Prostate Cancer from RNA-Seq Data
SN - Not Available
AU - Piao Y.
AU - Choi N.
AU - Li M.
AU - Piao M.
AU - Ryu K.
PY - 2014
SP - 51
EP - 56
DO - 10.5220/0005173700510056