HAPLOTYPE-BASED CLASSIFIERS TO PREDICT INDIVIDUAL SUSCEPTIBILITY TO COMPLEX DISEASES - An Example for Multiple Sclerosis

María M. Abad-Grau, Nuria Medina-Medina, Andrés Masegosa, Serafín Moral

Abstract

The enormous amount of genetic data that is currently being produced with the explosion of genome-wide association studies is yielding an important effort in the construction of genetic-based predictive models for individual susceptibility to complex diseases. However, a constant pattern of low accuracy is observed in most of them. We hypothesize that a main cause of their low accuracy is the strong reduction of genetic information considered by the classifiers, and propose a three-fold solution that considers haplotype instead of genotype individual data, whole-genome markers instead of a more stringent selection and several-marker risk variants instead of only one or two. We have compared the performance of our approach with current approaches to predict individual genetic risk to multiple sclerosis, and have found that our method yielded significantly more accurate classifiers.

References

  1. Abad-Grau, M., Medina-Medina, N., Montes-Soldado, R., Matesanz, F., and Bafna, V. (2011). Sample reproducibility of genetic association using different multimarker tdts in genome-wide association studies: Characterization and a new approach. PLoS ONE, accepted.
  2. Abad-Grau, M., Medina-Medina, N., Montes-Soldado, R., Moreno-Ortega, J., and Matesanz, F. (2010). Genomewide association filtering using a highly locus-specific transmission/disequilibrium test. Human Genetics, 128:325-44.
  3. BickeBöller, H. and Clerget-Darpoux, F. (1995). Statistical properties of the allelic and genotypic transmission/disequilibrium test for multiallelic markers. Genet Epidemiol, 12:865-70.
  4. Domingos, P. and Pazzani, M. (1997). On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29:103-37.
  5. Evans, D., Visscher, P., and Wray, N. (2009). Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Human Molecular Genetics, 18:3525-31.
  6. (IMSGC), I. M. S. G. C. (2010). Evidence for polygenic susceptibility to multiple sclerosis - the shape of things to come. Am J Hum Genet, 86:621-5.
  7. Kuusisto, H., Kaprio, J., Kinnunen, E., Luukkaala, T., Koskenvuo, M., and Elovaara, I. (2008). Concordance and heritability of multiple sclerosis in finland: study on a nationwide series of twins. Eur J Neurol., 15(10):1106-10.
  8. Moreno-Ortega, J. J., Medina-Medina, N., MontesSoldado, R., and Abad-Grau, M. M. (2011). Improving reproducibility on tree based multimarker methods: Treedth. In Rocha, M., Corchado, J., FernándezRiverola, F., and Valencia, A., editors, PACBB 7811: Proceedings of the 5th International Conference on Practical APplications of Computational Biology and Bioinformatics, volume 1, pages 1-8, Berlin, Heidelberg. Springer-Verlag.
  9. Sebastiani, P. and Solovieff, N. (2011). Nave bayesian classifier and genetic risk score for genetic risk prediction of a categorical trait: Not so different after all! submitted.
  10. Sevon, P., Toivonen, H., and Ollikainen, V. (2006). Treedt: Tree pattern mining for gene mapping. IEEE/ACM Trans. Comput. Biol. Bioinf., 3(2):174-85.
  11. Sham, P. C. and Curtis, D. (1995). An extended transmission/disequilibrium test (tdt) for multiallelic marker loci. Annals of Human Genetics, 59:323-336.
  12. Tzeng, J., Devlin, B., Wasserman, L., and Roeder, K. (2003). On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. Am J Hum Genet, 72:891-902.
  13. Wang, J. H., Pappas, D., Jager, P. L. D., Pelletier, D., de Bakker, P. I., Kappos, L., Polman, C. H., 'Australian, (ANZgene)78, N. Z. M. S. G. C., Chibnik, L. B., Hafler, D. A., Matthews, P. M., Hauser, S. L., Baranzini, S. E., and Oksenberg, J. R. (2011). Modeling the cumulative genetic risk for multiple sclerosis from genome-wide association data. Genome Medicine, 3:3.
  14. Wray, N., Goddard, M., and Visscher, P. (2007). Prediction of individual genetic risk to disease from genomewide association studies. Genome Research, 17:1520- 28.
  15. Yu, K., Gu, C. C., Xiong, C., An, P., and Province, M. (2005). Global Transmission/Disequilibrium tests based on haplotype sharing in multiple candidate genes. Genetic Epidemiology, 29:223-35.
  16. Zhang, S., Sha, Q., Chen, H., Dong, J., and Jiang, R. (2003). Transmission/Disequilibrium test based on haplotype sharing for tightly linked markers. Am J Hum Genet, 73:566-79.
Download


Paper Citation


in Harvard Style

M. Abad-Grau M., Medina-Medina N., Masegosa A. and Moral S. (2012). HAPLOTYPE-BASED CLASSIFIERS TO PREDICT INDIVIDUAL SUSCEPTIBILITY TO COMPLEX DISEASES - An Example for Multiple Sclerosis . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012) ISBN 978-989-8425-90-4, pages 360-366. DOI: 10.5220/0003874003600366


in Bibtex Style

@conference{bioinformatics12,
author={María M. Abad-Grau and Nuria Medina-Medina and Andrés Masegosa and Serafín Moral},
title={HAPLOTYPE-BASED CLASSIFIERS TO PREDICT INDIVIDUAL SUSCEPTIBILITY TO COMPLEX DISEASES - An Example for Multiple Sclerosis},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)},
year={2012},
pages={360-366},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003874003600366},
isbn={978-989-8425-90-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)
TI - HAPLOTYPE-BASED CLASSIFIERS TO PREDICT INDIVIDUAL SUSCEPTIBILITY TO COMPLEX DISEASES - An Example for Multiple Sclerosis
SN - 978-989-8425-90-4
AU - M. Abad-Grau M.
AU - Medina-Medina N.
AU - Masegosa A.
AU - Moral S.
PY - 2012
SP - 360
EP - 366
DO - 10.5220/0003874003600366