regions within chromosome ChrN. Let S
k
(N) =
(ChrN
k
[i
1
, j
1
], . . . , ChrN
k
[i
p
, j
p
N
]) denote the se-
quence of non-conserved regions in left-to-right
order in chromosome ChrN
k
of the kth strain. For
each k = 1, . . . , 39, and for each N = 1, . . . , 16, we
construct a binary vector
ˆ
S
k
(N) of the same size as
S
k
(N), where the nth component is
′
0
′
if the segment
ChrN
k
[i
n
, j
n
] is smaller than the user-supplied size
threshold d, and
′
1
′
otherwise. We use default thresh-
olds d = 4000 (called Ty-c threshold) and d = 300
(called LTR-c threshold) according to whether we
want to detect just Tys or also LTRs, respectively.
Finally, let
ˆ
S
k
=
ˆ
S
k
(1)
ˆ
S
k
(2)· · ·
ˆ
S
k
(16) be the bi-
nary sequence corresponding to the concatenation of
the 16 chromosomes of the kth strain. Note that,
since the number of conserved regions is the same
in each input chromosome, then also the number
of non-conserved regions is the same over all the
chromosomes. It follows that the 39 binary vectors
ˆ
S
1
,
ˆ
S
2
, . . . ,
ˆ
S
39
have the same size. Using them as rep-
resentatives of the mobilome topology of the corre-
sponding 39 yeast strains, we apply the clustering
package of the
scipy
scientific library (Jones et al.,
2001) to perform a hierarchical clustering. The cho-
sen metric is the Hamming distance, while the se-
lected linkage method is
UPGMA
. In this way, we gen-
erate a tree which we call the mobilome tree.
The resulting mobilome tree reveals the clusters
among strains obtained by minimizing the movements
of PMEs. It is really interesting to compare the mo-
bilome tree with the tree obtained by standard phy-
logenetic approaches based on SNPs comparison on
a set of suitably identified genes (Liti et al., 2009).
Almost all of the clades determined by the two trees
coincide: this would support the recently established
paradigm that Tys are able to drive the evolution of
organisms, as reported in (Kazian, 2004). It is re-
markable that the amount of information needed for
our approach is really minimal, and can be obtained
a priori. An interesting side observation is that the
mobilome tree does not change when employing the
binary vectors
ˆ
S
1
,
ˆ
S
2
, . . . ,
ˆ
S
39
using the LTR-c thresh-
old d = 300 (rather than the Ty-c threshold): also in
this case, the large majority of clades are identical to
those of the classical phylogenetic tree.
REFERENCES
Bennetzen, J. (2000). Transposable elements contribution
to plant gene and genome evolution. Plant Mol Biol,
42:251–269.
Bourque, G. (2009). Transposable elements in gene regula-
tion and in the evolution of vertebrate genomes. Cur-
rent Opinion in Genetics and Development, 19:607–
612.
Brittten, R. (2010). Transposable element insertions
have strongly affected human evolution. PNAS,
107:19945–19948.
Di Rienzi, S., Collingwood, D., Raghuraman, M., and
Brewer, B. (2010). Fragile genomic sites are associ-
ated with origins of replication. Genome Biology and
Evolution, 1(0):350.
Fachinetti, D., Bermejo, R., Cocito, A., Minardi, S., Ka-
tou, Y., Kanoh, Y., Shirahige, K., Azvolinsky, A., Za-
kian, V., and Foiani, M. (2010). Replication Termina-
tion at Eukaryotic Chromosomes Is Mediated by Top2
and Occurs at Genomic Loci Containing Pausing Ele-
ments. Molecular Cell, 39(4):595–605.
Gerton, J., DeRisi, J., Shroff, R., Lichten, M., Brown, P.,
and Petes, T. (2000). Global mapping of meiotic re-
combination hotspots and coldspots in the yeast Sac-
charomyces cerevisiae. Proceedings of the National
Academy of Sciences of the United States of America,
97(21):11383.
Jones, E., Oliphant, T., Peterson, P., et al. (2001). SciPy:
Open source scientific tools for Python.
Kazian, H. H. (2004). Mobile elements: Drivers of genome
evolution. Science, 303:1626–1632.
Kidwell, M. G. and Lisch, D. R. (2001). Perspective: trans-
posable elements, parasitic dna, and genome evolu-
tion. Evolution, 55:1–24.
Koszul, R., Caburet, S., Dujon, B., and Fischer, G. (2004).
Eucaryotic genome evolution through the spontaneous
duplication of large chromosomal segments. EMBO
J., 23:234–243.
Leonardo, T. and Nuzhdin, S. (2002). Mobile elements and
disease. Genet Res, 80:155–161.
Liti, G., Carter, D. M., Moses, A. M., and et al. (2009). Pop-
ulation genomics of domestic and wild yeast. Nature,
458:337–341.
Menconi, G., Battaglia, G., Grossi, R., Pisanti, N., and
Marangoni, R. (2011). Inferring mobile elements
in S.cerevisiae strains. In International Conference
on Bioinformatics Models, Methods and Algorithms.
SciTePress. ISBN 978-989-8425-36-2.
Rankin, D., Bichsel, M., and Wagner, A. (2010). Mobile
dna can drive lineage extinction in prokaryotic pop-
ulations. Journal of Evolutionary Biology, 23:2422–
2431.
SGD (2010). Saccharomyces Genome Database. http://
www.yeastgenome.org/.
Shao, Y., He, X., Harrison, E. M., Tai, C., Ou, H.-Y., Ra-
jakumar, K., and Den, Z. (2010). mgenomesubtrac-
tor: a web-based tool for parallel in silico subtractive
hybridization analysis of multiple bacterial genomes.
Nucleic Acids Research, 38:W194–W200.
Siefert, J. L. (2009). Defining the mobilome. Methods in
Molecular Biology, 532:13–27.
Szilard, R., Jacques, P., Laram´ee, L., Cheng, B., Galicia,
S., Bataille, A., Yeung, M., Mendez, M., Bergeron,
M., Robert, F., et al. (2010). Systematic identification
of fragile sites via genome-wide location analysis of
γ-H2AX. Nature Structural & Molecular Biology.
BIOINFORMATICS 2012 - International Conference on Bioinformatics Models, Methods and Algorithms
274