Based on both simulated and real data analyses, this
paper promotes the use of FLTMs as a simple and use-
ful framework for disease association detection in hu-
man genetics. Efficient capture of indirect genetic as-
sociation is achieved through two major reasons: (i)
the causal SNP ancestor nodes succeed in capturing
indirect associations with the phenotype; (ii) at the
opposite, the other latent nodes globally show very
weak associations. In other words, this property al-
lows to distinguish between true and false indirect ge-
netic associations.
The numbers of SNPs in the benchmarks were
limited. Nonetheless, this limitation is not a bias to
the sound characterization of the fading of informa-
tion in the FLTM hierarchies: bottom-up information
decays does concern the forest depth and does not in-
terfere with the forest width. It must be underlined
that our tests were not designed to meet the small
n, large p condition (many more variables (SNPs)
than subjects) as in genome-wide association studies
(GWASs). Again, this is not a bias to our study:
over thirty-six various scenarii, we have shown that
the overwhelming part (about three quarters) of false
positives confines in a unique tree, namely the one
harbouring the causal SNP (causal tree). In the con-
ditions of a GWAS, the forest width may well be far
larger than those observed in our tests, the false po-
sitives are expected to remain confined in the causal
tree, for the major part.
In a previous work, we have developped a scala-
ble FLTM learning algorithm, thus reaching orders of
magnitude consistent with GWAS demands (10
riables, 2000 individuals). In addition to scalability,
data dimension reduction advocates the use of FLTM-
based modeling in GWASs: the issue of multiple hy-
pothesis testing in GWASs would be resolved by tes-
ting a low number of latent variables instead of a large
number of observed variables. However, before en-
visaging an FLTM-based GWAS, an inescapable pre-
requisite was testing whether the bottom-up informa-
tion fading through the forest would nevertheless al-
low reliable association detection. No less unavoida-
ble was the close examination of ratios of latent varia-
bles erroneously associated with the disease.
A precursory work to the GWAS concern, the
present contribution assets the soundness of the
FLTM model for association detection. Besides, we
have conceived a procedure to guarantee a given
family-wise (type I) error rate through the computa-
tion of layer-specific per-test error rates. The success-
ful test of our algorithm under a large spectrum of
conditions allows its integration in a GWAS tool.
