window size, target size (regulated by the adjustment)
and the target number of regions.
4 CONCLUSIONS
We presented a new method of computing broad
regions associated with chromatin modifications that
is applicable even when ChIP data do not exhibit
large fold changes between the affected regions and
the rest of the genome. Although our method was
conceived and implemented before the publication of
EDD (Lund et al., 2014), it shares the following three
aspects: (a) using scores of windows rather than a
selected cut-off between “good” and “bad” windows,
(b) basing the score number on the ranks of the
windows and (c) applying a natural combinatorial
problem to group windows into regions.
Our scoring method models the distributions of
ChIP and control reads more accurately; thus, we
avoid the positive bias for selecting windows in the
least accessible parts of the genome. It remains an
open question whether this is a good model, and we
expect further progress in this direction.
The combinatorial problem that we have
applied, 1DFS, is much more natural than the iterative
selection of fragments with the highest sum of scores,
which can excessively merge “positive” regions with
the “negative” regions that separate them. Lund et al.
(2014) introduced gap penalty (decreasing all
negative scores by a constant) to reduce that
tendency, but we suspect that this is one of the reasons
why EDD works with such low granularity. Although
1DFS is a global optimization problem, we have
found a solution that is very efficient and easy to
implement.
Our method uses only two parameters, k and α,
but the proper selection of parameters remains an
open problem. In Section 2, we set k to obtain the
number of LADs, which is close to the number
reported in papers applying the DamID method (see
Peric-Hupkes et al., 2010). Parameter α can be
selected in different ways. As it increases, the
proportion of windows with positive score(w)
decreases, as does the sum of lengths of identified
LADs. However, when we decrease α too much, the
p-values of the computed LADs tend to increase, and
we cannot suggest a statistic that allows to optimize
α. In fact, we tested our program and EDD on six
genes confirmed to be in LADs using ChIP-qPCR
(data not included). We found that increasing α may
paradoxically exclude some of them, whereas
choosing a consistent α of 0.12 led to consistent
inclusion of 5 of the genes in LADs computed for all
four e9.5 samples. LADs computed by EDD (which
automatically adjusts parameters to optimize the p-
values) consistently included exactly 3 of these genes.
This small evidence suggests that at present there is
no better way to select the parameters than using
whatever knowledge we have, most preferably some
genomic positions confirmed to be in LADs or
outside LADs, and picking the parameters to be
consistent with that knowledge. The situation with
identifying short peaks of transcription factors is
similar because the existing programs can produce
"false positives," i.e., statistically significant peaks
that are too weak to have a biological impact.
Therefore, these programs provide options to select
the parameters, such as maximum p-value/FDR,
minimum fold change.
REFERENCES
Bernstein BE et al. (2010) The NIH Roadmap Epigenomics
Mapping Consortium. Nature Biotechnol. 28(10),
1045-8.
Ernst, J. et al. (2011) Mapping and analysis of chromatin
state dynamics in nine human cell types. Nature,
473(7345), 43–49.
Guelen, L. et al. (2008) Domain organization of human
chromosomes revealed by mapping of nuclear lamina
interactions. Nature, 453(7197), 948–951.
Kharchenko, P. V. et al. (2008) Design and analysis of
ChIP-seq experiments for DNA-binding proteins. Nat.
Biotechnol., 26(12), 1351–1359.
Lund, E. et al. (2014) Enriched domain detector: a program
for detection of wide genomic enrichment domains
robust against local variations. Nucleic Acids Res.,
42(11), e92.
Mikkelsen, T. S. et al. (2007) Genome-wide maps of
chromatin state in pluripotent and lineage-committed
cells. Nature, 448(7153), 553–560.
Padeken, J. and Heun, P. (2014) Nucleolus and nuclear
periphery: velcro for heterochromatin. Curr. Opin.
Cell. Biol., 28, 54–60.
Peric-Hupkes, D. et al. (2010) Molecular maps of the
reorganization of genome-nuclear lamina interactions
during differentiation. Mol. Cell., 38(4), 603–613.
Shah, P. P. et al. (2013) Lamin B1 depletion in senescent
cells triggers large-scale changes in gene expression
and the chromatin landscape. Genes. Dev., 27(16),
1787–1799.
Xu, H. et al. (2010) A single-noise model for significance
analysis of ChIP-seq with negative control.
Bioiformatics, 26(9), 1199–1204.