Forests of Latent Tree Models for Genome-Wide Association Studies

Phan Duc Thanh

2014

Abstract

In the bio-medical research domain, with the availability of massive amounts of data generated by high-throughput genotyping technologies, genome-wide association studies (GWAS) have become feasible and are now considered as a method of choice for statistically connecting DNA variations and human disorders. Since GWASs are based upon the principle of linkage disequilibrium (LD) at the population level, it is essential be able to effectively model LD in human genome. At the intersection of multiple fields, this thesis aims to study the design and implementation of methods based on Bayesian Networks (BN) in the context of GWASs, for investigating the genetic architecture of complex diseases.

References

  1. Abel, H. J. and Thomas, A. (2011). Accuracy and computational efficiency of a graphical modeling approach to linkage disequilibrium estimation. Statistical applications in genetics and molecular biology, 10(1).
  2. Auber, D. (2004). Tulip: A huge graph visualization framework. In Graph Drawing Software, pages 105-126. Springer.
  3. Balding, D. J. (2006). A tutorial on statistical methods for population association studies. Nature Reviews Genetics, 7(10):781-791.
  4. Botta, V., Hansoul, S., Geurts, P., and Wehenkel, L. (2008). Raw genotypes vs haplotype blocks for genome wide association studies by random forests. In Proc. of MLSB 2008, second workshop on Machine Learning in Systems Biology.
  5. Browning, B. L. and Browning, S. R. (2009). A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. The American Journal of Human Genetics, 84(2):210-223.
  6. Dagum, L. and Menon, R. (1998). Openmp: an industry standard api for shared-memory programming. Computational Science & Engineering, IEEE, 5(1):46-55.
  7. Fabregat-Traver, D. and Bientinesi, P. (2012). Computing petaflops over terabytes of data: The case of genome-wide association studies. arXiv preprint arXiv:1210.7683.
  8. Friedman, N., Linial, M., Nachman, I., and Pe'er, D. (2000). Using bayesian networks to analyze expression data. Journal of computational biology, 7(3-4):601-620.
  9. Gabriel, E., Fagg, G. E., Bosilca, G., Angskun, T., Dongarra, J. J., Squyres, J. M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., et al. (2004). Open mpi: Goals, concept, and design of a next generation mpi implementation. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 97-104. Springer.
  10. Han, B., Kang, H., Seo, M. S., Zaitlen, N., and Eskin, E. (2008). Efficient association study design via poweroptimized tag snp selection. Annals of human genetics, 72(6):834-847.
  11. Hechter, E. (2011). On genetic variants underlying common disease. PhD thesis, Oxford University.
  12. Klein, R. J., Zeiss, C., Chew, E. Y., Tsai, J.-Y., Sackler, R. S., Haynes, C., Henning, A. K., SanGiovanni, J. P., Mane, S. M., Mayne, S. T., et al. (2005). Complement factor h polymorphism in age-related macular degeneration. Science, 308(5720):385-389.
  13. Kollar, D. and Friedman, N. (2009). Probabilistic graphical models: principles and techniques. The MIT Press.
  14. Laramie, J. M., Wilk, J. B., DeStefano, A. L., and Myers, R. H. (2007). Haplobuild: an algorithm to construct non-contiguous associated haplotypes in family based genetic studies. Bioinformatics, 23(16):2190-2192.
  15. Lee, P. H. and Shatkay, H. (2006). Bntagger: improved tagging snp selection using bayesian networks. In ISMB (Supplement of Bioinformatics), pages 211-219.
  16. Moore, J. H., Asselbergs, F. W., and Williams, S. M. (2010). Bioinformatics challenges for genome-wide association studies. Bioinformatics, 26(4):445-455.
  17. Mourad, R., Sinoquet, C., and Leray, P. (2010). Learning hierarchical bayesian networks for genome-wide association studies. In Proceedings of COMPSTAT'2010, pages 549-556. Springer.
  18. Mourad, R., Sinoquet, C., and Leray, P. (2011). A hierarchical bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC bioinformatics, 12(1):16.
  19. Mourad, R., Sinoquet, C., Zhang, N. L., Liu, T., Leray, P., et al. (2013). A survey on latent tree models and applications. J. Artif. Intell. Res.(JAIR), 47:157-203.
  20. Nefian, A. V. (2006). Learning snp dependencies using embedded bayesian networks. In IEEE Computational Systems, Bioinformatics Conference, pages 1-6.
  21. Patil, N., Berno, A. J., Hinds, D. A., Barrett, W. A., Doshi, J. M., Hacker, C. R., Kautzer, C. R., Lee, D. H., Marjoribanks, C., McDonough, D. P., et al. (2001). Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science, 294(5547):1719-1723.
  22. Pattaro, C., Ruczinski, I., Fallin, D., and Parmigiani, G. (2008). Haplotype block partitioning as a tool for dimensionality reduction in snp association studies. BMC genomics, 9(1):405.
  23. Schaid, D. J. (2004). Evaluating associations of haplotypes with traits. Genetic epidemiology, 27(4):348-364.
  24. Scheet, P. and Stephens, M. (2006). A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. The American Journal of Human Genetics, 78(4):629-644.
  25. Su, Z., Marchini, J., and Donnelly, P. (2011). Hapgen2: simulation of multiple disease snps. Bioinformatics, 27(16):2304-2305.
  26. Thomas, A. (2009). Estimation of graphical models whose conditional independence graphs are interval graphs and its application to modelling linkage disequilibrium. Computational statistics & data analysis, 53(5):1818-1828.
  27. Thomas, A. and Camp, N. J. (2004). Graphical modeling of the joint distribution of alleles at associated loci. The American Journal of Human Genetics, 74(6):1088- 1101.
  28. Thomas, A. and Green, P. J. (2009). Enumerating the junction trees of a decomposable graph. Journal of Computational and Graphical Statistics, 18(4):930-940.
  29. Verzilli, C. J., Stallard, N., and Whittaker, J. C. (2006). Bayesian graphical models for genomewide association studies. The american journal of human genetics, 79(1):100-112.
  30. Zhang, N. L. (2004). Hierarchical latent class models for cluster analysis. The Journal of Machine Learning Research, 5:697-723.
  31. Zhang, Y. and Ji, L. (2009). Clustering of snps by a structural em algorithm. In Bioinformatics, Systems Biology and Intelligent Computing, 2009. IJCBS'09, pages 147-150. IEEE.
Download


Paper Citation


in Harvard Style

Duc Thanh P. (2014). Forests of Latent Tree Models for Genome-Wide Association Studies . In Doctoral Consortium - DCBIOSTEC, (BIOSTEC 2014) ISBN Not Available, pages 74-81


in Bibtex Style

@conference{dcbiostec14,
author={Phan Duc Thanh},
title={Forests of Latent Tree Models for Genome-Wide Association Studies},
booktitle={Doctoral Consortium - DCBIOSTEC, (BIOSTEC 2014)},
year={2014},
pages={74-81},
publisher={SciTePress},
organization={INSTICC},
doi={},
isbn={Not Available},
}


in EndNote Style

TY - CONF
JO - Doctoral Consortium - DCBIOSTEC, (BIOSTEC 2014)
TI - Forests of Latent Tree Models for Genome-Wide Association Studies
SN - Not Available
AU - Duc Thanh P.
PY - 2014
SP - 74
EP - 81
DO -