Table 4: Average PPV if background model M1, L = 7,
σ = 0.7.
Tissue EP
2
N
2
Limb 0.55 0.53
Forebrain 0.56 0.53
Midbrain 0.48 0.49
Heart 0.53 0.53
Average 0.53 0.52
obtained by ensuring a maximum of repetitive se-
quence for every negative sample as done in (G¨oke
et al., 2012). Although the PPV values decrease com-
pared to the previous Tables, these later experiments
confirm that similar tissue-specific enhancers have a
higher sequence similarity, and thus they can be de-
tected with alignment-free methods.
4 CONCLUSIONS
In this paper we studied the use of alignment-free
measures to detect functional and/or evolutionary
similarities among regulatory sequences. We intro-
duced a multiple resolution alignment-free method
based on Entropic Profiles that is designed around the
use of variable-length words combined with statisti-
cal properties based on Information Theory. To eval-
uate the performance of several alignment-free meth-
ods, we devised a series of tests on both synthetic
and real data. In almost all simulations our method
EP
2
outperforms all other statistics. Importantly EP
2
is also able to detect similarities between in vivo
identified enhancer sequences, e.g. of mouse. This
will help to better understand the sequence-dependent
code within CRMs, which is responsible for the large
diversity of cell types.
ACKNOWLEDGEMENTS
M. Comin was partially supported by the P.R.I.N.
Project 20122F87B2.
REFERENCES
Altschul, S., Gish, W., Miller, W., Myers, E., and Lipman,
D. (1990). Basic local alignment search tool. J. Mol.
Biol., 215:403–410.
Blaisdell, B. (1986). A measure of the similarity of sets
of sequences not requiring sequence alignment. Proc.
Natl Acad. Sci., 83(5155-5159).
Blow, M. et al. (2010). Chip-seq identification of
weakly conserved heart enhancers. Nature Genetics,
42(9):806–810.
Comin, M. and Antonello, M. (2013). Fast computation of
entropic profiles for the detection of conservation in
genomes. In in BIoinformatics (LNBI), L. N., editor,
Proceedings of Pattern Recognition in Bioinformatics,
volume 7986, pages 277–288.
Comin, M. and Antonello, M. (2014). Fast entropic pro-
filer: An information theoretic approach for the dis-
covery of patterns in genomes. IEEE/ACM Transac-
tions on Computational Biology and Bioinformatics,
11(3):500 – 509.
Comin, M., Leoni, A., and Schimd, M. (2014). Qcluster:
Extending alignment-free measures with quality val-
ues for reads clustering. Algorithms in Bioinformatics,
Lecture Notes in Computer Science, 8701:1–13.
Comin, M. and Schimd, M. (2014). Assembly-free
genome comparison based on next-generation se-
quencing reads and variable length patterns. BMC
Bioinformatics, 15(Suppl 9):S1.
Comin, M. and Verzotto, D. (2010). Classification of pro-
tein sequences by means of irredundant patterns. BMC
bioinformatics, 11(Suppl 1):S16.
Comin, M. and Verzotto, D. (2011). The irredundant
class method for remote homology detection of pro-
tein sequences. Journal of Computational Biology,
18(12):1819–1829.
Comin, M. and Verzotto, D. (2014). Beyond fixed-
resolution alignment-free measures for mammalian
enhancers sequence comparison. IEEE/ACM Trans-
actions on Computational Biology and Bioinformat-
ics, 11(4):628–637.
Fernandes, F., Freitas, A., Almeida, J., and Vinga, S.
(2009). Entropic profiler - detection of conservation
in genomes using information theory. BMC research
notes, 2:72.
Foret, S., Wilson, S., and Burden, C. (2009). Characterising
the d2 statistic: word matches in biological sequences.
Stat. Appl. Genet. Mol. Biol., 8(43).
G¨oke, J., Schulz, M., Lasserre, J., and Vingron, M. (2012).
Estimation of pairwise sequence similarity of mam-
malian enhancers with word neighbourhood counts.
28(5):656–663.
Kantorovitz, M., Robinson, G., and Sinha, S. (2007). A sta-
tistical method for alignment-free comparison of reg-
ulatory sequences. 23(13):249–255.
Liu, X., Wan, L., Reinert, G., Waterman, M., Sun, F., and
Li, J. (2011). New powerful statistics for alignment-
free sequence comparison under a pattern transfer
model. 1:106–116.
Reinert, G., Chew, D., Sun, F., and Waterman, M. S.
(2009). Alignment-free sequence comparison (i):
statistics and power. Journal of Computational Biol-
ogy, 16(12):1615–1634.
S. Robin, e. a. (2005). DNA, Words and Models: Statistics
of Exceptional Words. Cambridge University Press.
Shlyueva, D., Stampfel, G., and Stark, A. (2014). Tran-
scriptional enhancers: from properties to genome-
wide predictions. Nature Reviews Genetics, 15:272
– 286.
Smith, T. and Waterman, M. (1981). Comparison of biose-
quences. Adv. Appl. Math., 2:482–489.
BIOINFORMATICS2015-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms
176