# FORESTS OF LATENT TREE MODELS FOR THE DETECTION OF GENETIC ASSOCIATIONS

### Christine Sinoquet, Raphaël Mourad, Philippe Leray

#### Abstract

Together with the population aging concern, increasing health care costs require understanding the causal basis for common genetic diseases. The high dimensionality and complexity of genetic data hamper the detection of genetic associations. To alleviate the core risks (missing of the causal factor, spurious discoveries), machine learning offers an appealing alternative framework to standard statistical approaches. A novel class of probabilistic graphical models has recently been proposed - the forest of latent tree models - , to obtain a trade-off between faithful modeling of data dependences and tractability. In this paper, we evaluate the soundness of this modeling approach in an association genetics context. We have performed intensive tests, in various controlled conditions, on realistic simulated data. We have also tested the model on real data. Beside guaranteeing data dimension reduction through latent variables, the model is empirically proven able to capture indirect genetic associations with the disease, both on simulated and real data. Strong associations are evidenced between the disease and the ancestor nodes of the causal genetic marker node, in the forest. In contrast, very weak associations are obtained for other nodes.

#### References

- Ben-Dor, A., Shamir, R., and Yakhini, Z. (1999). Clustering gene expression patterns. In Proc. of the 3rd annual int. con. on Computational molecular biology, pages 33-42.
- Chen, T., Zhang, N., Liu, T., Poon, K., and Wang, Y. (2011). Model-based multidimensional clustering of categorical data. In Artificial intelligence, in press.
- Daly, M. J., Rioux, J. D., Schaffner, S. F., Hudson, T. J., and Lander, E. S. (2001). High-resolution haplotype structure in the human genome. Nat. Genet., 29(2):229- 232.
- Han, B., Park, M., and Chen, X. W. (2010). A Markov blanket-based method for detecting causal SNPs in GWAS. BMC Bioinformatics, 11(Suppl 3):S5+.
- Harmeling, S. and Williams, C. K. I. (2011). Greedy learning of binary latent trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(6):1087- 1097.
- Hosking, L. K., Boyd, P. R., and Xu, C. F. e. a. (2002). Linkage disequilibrium mapping identifies a 390 kb region associated with CYP2D6 poor drug metabolising activity. Pharmacogenomics J., 2(3):165-175.
- Hwang, K.-B., Kim, B.-H., and Zhang, B.-T. (2006). Learning hierarchical bayesian networks for large-scale data analysis. In ICONIP, pages 670-679.
- Mourad, R., Sinoquet, C., and Leray, P. (2010). Learning hierarchical Bayesian networks for genome-wide association studies. In COMPSTAT, pages 549-556.
- Mourad, R., Sinoquet, C., and Leray, P. (2011). A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC Bioinformatics, 12:16+.
- Schwartz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2):461-464.
- Spencer, C. C., Su, Z., Donnelly, P., and Marchini, J. (2009). Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genetics, 5(5):e1000477+.
- Verzilli, C. J., Stallard, N., and Whittaker, J. C. (2006). Bayesian graphical models for genome-wide association studies. The American Journal of Human Genetics, 79:100-112.
- Wang, Y., Zhang, N. L., and Chen, T. (2008). Latent tree models and approximate inference in Bayesian networks. Machine Learning, 32:879-900.
- Zhang, N. L. (2004). Hierarchical latent class models for cluster analysis. JMLR, 5:697-723.
- Zhang, N. L. and Kocka, T. (2004). Efficient learning of hierarchical latent class models. In ICTAI, pages 585- 593.
- Zhang, Y. and Ji, L. (2009). Clustering of SNPs by a structural EM algorithm. In Int. Joint Conf. on Bioinformatics, Systems Biology and Intelligent Computing, pages 147-150.

#### Paper Citation

#### in Harvard Style

Sinoquet C., Mourad R. and Leray P. (2012). **FORESTS OF LATENT TREE MODELS FOR THE DETECTION OF GENETIC ASSOCIATIONS** . In *Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)* ISBN 978-989-8425-90-4, pages 5-14. DOI: 10.5220/0003703400050014

#### in Bibtex Style

@conference{bioinformatics12,

author={Christine Sinoquet and Raphaël Mourad and Philippe Leray},

title={FORESTS OF LATENT TREE MODELS FOR THE DETECTION OF GENETIC ASSOCIATIONS},

booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)},

year={2012},

pages={5-14},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0003703400050014},

isbn={978-989-8425-90-4},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)

TI - FORESTS OF LATENT TREE MODELS FOR THE DETECTION OF GENETIC ASSOCIATIONS

SN - 978-989-8425-90-4

AU - Sinoquet C.

AU - Mourad R.

AU - Leray P.

PY - 2012

SP - 5

EP - 14

DO - 10.5220/0003703400050014