IMPROVED DISEASE OUTCOME PREDICTION BASED ON MICROARRAY AND CLINICAL DATA COMBINATION AND PRE-VALIDATION

Jana Šilhavá, Pavel Smrž

Abstract

Combining relevant information from high-dimensional microarray data and low-dimensional clinical variables to predict disease outcome is important to improve treatment decisions. Such a combination may yield more accurate predictions than those obtained based on the use of microarray or clinical data alone. We propose a combination of logistic regression for clinical data and BinomialBoosting for microarray data. Then we propose its extension designed for redundant sets of data. Our approach combines microarray and clinical data at the level of decision integration. The extension includes pre-validation of models built with microarray and clinical data followed by weights calculation. Weights determine relevance of microarray and clinical models for data combination. Evaluations are performed with several redundant and non-redundant simulated datasets. Then some tests are applied to two real benchmark datasets. Our approach increases outcome prediction on non-redundant simulated datasets and does not decrease outcome prediction on redundant simulated datasets. Pre-validation of built models improves outcome of the prediction up to 4% in the case of real redundant dataset.

References

  1. Boulesteix, A. L., Porzelius, C., and Daumer, M. (2008). Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value. Bioinformatics 24, 1698-1706.
  2. Buhlmann, P. and Hothorn, T. (2007). Boosting Algorithms: Regularization, Prediction and Model Fitting. Statist. Sci. 22, 477-505.
  3. Dupuy, A. and Simon, R. M. (2007). Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting. Journal of the National Cancer Institute 99 (2), 147- 157.
  4. Eden, P., Ritz, C., and Rose, C. (2004). Good Old clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers. Eur. J. Cancer 40 (12), 1803-1806.
  5. Fernandez-Teijeiro, A., Betensky, R. A., Sturla, L. M., Kim, J. Y., Tamayo, P., and Pomeroy, S. L. (2004). Combining gene expression profiles and clinical parameters for risk stratification in medulloblastomas. J Clin Oncol. 22 (6), 994-998.
  6. Fridlyand, J. and Yang, J. Y. H. (2004). DENMARKLAB R package. Advanced microarray data analysis: Class discovery and class prediction. Available at http://genome.cbs.dtu.dk/courses/norfa2004/Extras/.
  7. Gajdos, C., Tartter, P. I., and Bleiweiss, I. (1999). Lymphatic Invasion, Tumor Size, and Age Are Independent Predictors of Axillary Lymph Node Metastases in Women With T1 Breast Cancers. Ann Surg. 230 (5), 692-696.
  8. Gevaert, O., Smet, F. D., and Timmerman, D. (2006). Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics 22 (14), 184-190.
  9. Gruvberger, S. K., Ringner, M., and Eden, P. (2003). Expression profiling to predict outcome in breast cancer: the influence of sample selection. Breast Cancer Res. 5(1), 23-26.
  10. Hosmer, D. W. and Lemeshow, S. (2000). Applied Logistic Regression. Wiley, New York, 2nd edition.
  11. Hothorn, T. and Buhlmann, P. (2007). mboost: ModelBased Boosting. R package version 0.5-8. Bioinformatics, Available at http://CRAN.R-project.org/.
  12. Klijn, J. G. M., Wang, Y., Atkins, D., and Foekens, J. A. (2005). Prediction of cancer outcome with microarrays. Lancet. 365 (9472), 1685-1685.
  13. Ma, S. and Huang, J. (2007). Combining Clinical and Genomic Covariates via Cov-TGDR. Cancer Inform. 3, 371-378.
  14. Michiels, S., Koscielny, S., and Hill, C. (2005). Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet. 365 (9458), 488-492.
  15. Molinaro, A., Simon, R., and Pfeiffer, R. M. (2005). Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15), 3301-3307.
  16. Pittman, J., Huang, E., and Dressman, H. (2004). Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes. Proc.Natl.Acad.Sci. 101(22), 8431-8436.
  17. Tibshirani, R. and Efron, B. (2002). Pre-validation and inference in microarrays. Statistical applications in genetics and molecular biology 1, 1.
  18. van't Veer, L. J., Dai, H., and van de Vijver, M. J. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530-536.
Download


Paper Citation


in Harvard Style

Šilhavá J. and Smrž P. (2010). IMPROVED DISEASE OUTCOME PREDICTION BASED ON MICROARRAY AND CLINICAL DATA COMBINATION AND PRE-VALIDATION . In Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010) ISBN 978-989-674-019-1, pages 108-113. DOI: 10.5220/0002697601080113


in Bibtex Style

@conference{bioinformatics10,
author={Jana Šilhavá and Pavel Smrž},
title={IMPROVED DISEASE OUTCOME PREDICTION BASED ON MICROARRAY AND CLINICAL DATA COMBINATION AND PRE-VALIDATION},
booktitle={Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)},
year={2010},
pages={108-113},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002697601080113},
isbn={978-989-674-019-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)
TI - IMPROVED DISEASE OUTCOME PREDICTION BASED ON MICROARRAY AND CLINICAL DATA COMBINATION AND PRE-VALIDATION
SN - 978-989-674-019-1
AU - Šilhavá J.
AU - Smrž P.
PY - 2010
SP - 108
EP - 113
DO - 10.5220/0002697601080113