Authors:
Christine Sinoquet
1
;
Raphaël Mourad
2
and
Philippe Leray
2
Affiliations:
1
Université de Nantes, France
;
2
Ecole Polytechnique de l’Université de Nantes, France
Keyword(s):
Probabilistic graphical model, Bayesian network, Latent tree model, Detection of genetic association, Latent variable, Data dimension reduction.
Related
Ontology
Subjects/Areas/Topics:
Bioinformatics
;
Biomedical Engineering
;
Data Mining and Machine Learning
;
Model Design and Evaluation
Abstract:
Together with the population aging concern, increasing health care costs require understanding the causal basis for common genetic diseases. The high dimensionality and complexity of genetic data hamper the detection of genetic associations. To alleviate the core risks (missing of the causal factor, spurious discoveries), machine learning offers an appealing alternative framework to standard statistical approaches. A novel class of probabilistic graphical models has recently been proposed - the forest of latent tree models - , to obtain a trade-off between faithful modeling of data dependences and tractability. In this paper, we evaluate the soundness of this modeling approach in an association genetics context. We have performed intensive tests, in various controlled conditions, on realistic simulated data. We have also tested the model on real data. Beside guaranteeing data dimension reduction through latent variables, the model is empirically proven able to capture indirect geneti
c associations with the disease, both on simulated and real data. Strong associations are evidenced between the disease and the ancestor nodes of the causal genetic marker node, in the forest. In contrast, very weak associations are obtained for other nodes.
(More)