tion; (3) relevance of the partitioning method to guide
an FLTM-based GWAS pinpointing regions with sig-
nificantly associated SNPs. The CAST
method was shown slightly different from CAST
and DBSCAN, from the clustering viewpoint. How-
ever, this difference was not reflected by a difference
in GWASs’ performances. Therefore, to the initial
question ”Which clustering method should be cho-
sen”, the answer for the Crohn’s disease WTCCC data
set relative to chromosome 2 would rather prioritize
easiness in tuning parameters. In our experiments so
far, the FLTM learning algorithm seems robust to the
choice of the clustering method, provided that the in-
trinsic parameters of the latter are appropriately set.
Further works include extending the current analysis
to other chromosomes, for the WTCCC data set, as
well as to other diseases, and extending our analysis
to other clustering methods.
It was the first time that the FLTM learning algo-
rithm was run on real GWAS data. It is questionable
whether the present study should be complemented by
intensive experiments run on simulated GWAS data
sets. Given the high processing times required as soon
as GWASs are addressed, and the recurring question
of generating sufficiently realistic GWAS data, a less
systematic approach, encompassing more diseases,
seems wholly relevant.
Finally, to return to the multilocus aspect of the
type of GWAS addressed here, one of our next tasks
is to compare the FLTM-based GWAS strategy with
the few other scalable multilocus approaches existing,
including BEAGLE (Browning and Browning, 2007).
The project SAMOGWAS (Specific Advanced MOd-
els for Genome Wide Association Studies) is sup-
ported by the French National Research Agency
(Agence Nationale de la Recherche, ANR). The au-
thors are also grateful to the Wellcome Trust Case
Control Consortium for providing the GWAS data
used in this study.
