Authors:
Hugo Boisaubert
and
Christine Sinoquet
Affiliation:
LS2N, UMR CNRS 6004, University of Nantes, France
Keyword(s):
Gene-gene Interaction, Machine Learning, Markov Blanket, High-dimensional Data, Comparative Analysis.
Abstract:
In this paper, we report three contributions in the field of gene-gene interaction (epistasis) detection. Our first contribution is the comparative analysis of five approaches designed to tackle epistasis detection, on real-world datasets. The aim is to help fill the lack of feedback on the behaviors of published methods in real-life epistasis detection. We focus on four state-of-the-art approaches encompassing random forests, Bayesian inference, optimization techniques and Markov blanket learning. Besides, a recently developed approach, SMMB-ACO (Stochastic Multiple Markov Blankets with Ant Colony Optimization) is included in the comparison. Thus, our second contribution addresses assessing the behavior of SMMB-ACO on real-world data, while SMMB-ACO was mainly evaluated so far through small-scale simulations. We used a published case control dataset related to Crohn’s disease. Focusing on pairwise interactions, we report a great heterogeneity across the methods in running times, mem
ory occupancies, numbers of interactions output, distributions of p-values and odds ratios characterizing the interactions. Then, our third contribution is a proof-of-concept study in the context of genetic association interaction studies, to foster alternatives to analyses driven by prior biological knowledge. The principle is to cross the results of several machine learning methods whose intrinsic mechanisms greatly differ, to provide a priorized list of interactions to be validated experimentally. Focusing on the interactions identified in common by two methods at least, we obtained a priorized list of 56 interactions, from which we could infer one interaction network of size 7, four networks of size 4 and six of size 3.
(More)