Ensemble Method for Prediction of Prostate Cancer from RNA-Seq Data

Yongjun Piao, Nak Hyun Choi, Meijing Li, Minghao Piao, Keun Ho Ryu


The main idea of our research is to develop an ensemble machine learning algorithm to accurately classify prostate cancer using RNA-Seq data. To date, many studies have focused on predicting prostate cancer using microarray data. Recently, RNA-Seq is rapidly being used for cancer studies as an alternative for microarray. Thus, new machine learning algorithms are needed to analyze RNA-Seq data which have different characteristic compared with microarray. Currently the PhD research has been running for one year and has focused on analyzing existing state-of-art normalization methods, gene expression data analysis, and ensemble methods. Besides that, we designed an ensemble feature selection algorithm to select relevant genes from the gene expression data. Moreover, we have developed a 'digital' gene expression data simulator for evaluating the performance of proposed algorithms. The next step will be to construct an accurate ensemble prediction model to diagnosis of prostate cancer. Finally, the model will be fine-tuned based on the feedback from the medical doctors.


