ies as well as some basic properties of the identi-
fied potential triplex sequences (PTS). We found that
most of the triplex-forming potential of the human
genome is concentrated in simple repeats and flank-
ing regions of repetitive and other genome elements
descending from 7SL RNA, especially Alu and SVA
repeats. We also found potential triplex-forming se-
quences in the miRNA class of RNA genes. Alu ele-
ments are known to contain or flank adenine homonu-
cleotide tracts which replace polyadenylation of its
RNA, but could also carry out a DNA-based function
involving H-DNA formation.
We propose a computational rule to automatically
classify triplex-forming sequences according to the
most prevalent k-mer present in their sequence. For
unambiguity, we include the search for a lexicograph-
ically minimal rotation before assigning the name.
After applying this principle we see that the major-
ity of human PTS fall into four main classes based on
their nucleotide composition (T/A - 45.8%; CT/GA -
20.6%; CTT/GAA - 14.6% and CCT/GGA - 13.1%).
We also characterized the detected PTS based on dele-
tions found in alignments of the third triplex strand
to the DNA duplex, sowing that deletions are present
less frequently in the third strand, especially in an-
tiparallel PTS.
In terms of biological relevance, our studies of
PTS suggest they are positioned non-randomly in the
genome, their sequences fall into a small number of
distinct classes and some of them are associated with
specific types of repeats. Their strand bias for in-
sertions or deletions suggests that these sequences
may indeed form the predicted structures. In future it
would be desireable to single out specific combination
of repeat types and PTS classes, prove the existence
of triplex formation in each case and systematically
search for proteins that could interact with such struc-
tures and provide a more precise clue to their specific
biological function.
ACKNOWLEDGEMENTS
This work was supported by research grants from
the European Social Fund (CZ.1.07/2.3.00/20.0189)
to M.L., 13-36108S (Grant Agency of Science of
CR) to M.B., by IT4Innovations project, reg. no.
CZ.1.05/1.1.00/02.0070 funded by the EU Opera-
tional Programme, by MSMT Grant No.0021630528
and by BUT grant FIT-S-11-1.
REFERENCES
Aboyoun, P., Pages, H., and Lawrence, M. (2013). Genomi-
cranges: Representation and manipulation of genomic
intervals. Technical Report R package version 1.10.7.
Akman, S. A., Lingeman, R. G., Doroshow, J. H., and
Smith, S. S. (1991). Quadruplex dna formation in
a region of the trna gene supf associated with hy-
drogen peroxide mediated mutations. Biochemistry,
30(35):8648–8653.
Arora, A., Dutkiewicz, M., and Scaria, V. (2008). Inhibition
of translation in living eukaryotic cells by an rna g-
quadruplex motif. RNA, 14:1290–1296.
Bacolla, A. and Wells, R. (2004). Non-b dna conformations,
genomic rearrangements, and human disease. Journal
of Biological Chemistry, 279:47411–47414.
Bacolla, A., Wojciechowska, M., Kosmider, B., Larson,
J. E., and Wells, R. D. (2006). The involvement of
non-b dna structures in gross chromosomal rearrange-
ments. DNA Repair, 5:1161–1170.
Bailey, A. D., Pavelitz, T., and Weiner, A. M. (2013).
The microsatellite sequence (ct)n.(ga)n promotes sta-
ble chromosomal integration of large tandem arrays of
functional human u2 small nuclear rna genes. Molec-
ular and Cellular Biology, 18(4):2262–2271.
Bissler, J. J. (2007). Triplex dna and human disease. Fron-
tiers in Bioscience, 12:4536–4546.
Brereton, H., Firgaira, F., and Turner, D. (1993). Origins
of polymorphism at a polypurine hypervariable locus.
Nucleic Acids Research, 21(11):2563–2569.
Buske, F. A., Bauer, D. C., Mattick, J. S., and Bailey, T. L.
(2012). Triplexator: detecting nucleic acid triple he-
lices in genomic and transcriptomic. Genome Re-
search, 22(7):1372–1381.
Buske, F. A., Mattick, J. S., and Bailey, T. L. (2011). Po-
tential in vivo roles of nucleic acid triple-helices. RNA
Biology, 8(3):427–439.
Cer, R. Z., Bruce, K. H., Mudunuri, U. S., Yi, M., Vol-
fovsky, N., Luke, B. T., Bacolla, A., Collins, J. R.,
and Stephens, R. M. (2011). Non-b db: a database
of predicted non-b dna-forming motifs in mammalian
genomes. Nucleic Acids Research, 39(Database
issue):D383–D391.
Dewannieux, M., Esnault, C., and Heidmann, T. (2003).
Line-mediated retrotransposition of marked alu se-
quences. Nature Genetics, 35:41–48.
Dewannieux, M. and Heidmann, T. (2005). Role of
poly(a) tail length in alu retrotransposition. Genomics,
86(3):378–381.
Dixon, B., Lu, L., Chu, A., and Bissler, J. (2008). Recq
and recg helicases have distinct roles in maintaining
the stability of polypurine.polypyrimidine sequences.
Mutation Research, 643:20–28.
Durinck, S., Spellman, P., Birney, E., and Huber, W. (2009).
Mapping identifiers for the integration of genomic
datasets with the r/bioconductor package biomart. Na-
ture Protocols, 4:1184–1191.
Gowers, D. and Fox, K. (1998). Triple helix formation at
(at)n adjacent to an oligopurine tract. Nucleic Acids
Research, 26(16):3626–3633.
UnevenDistributionofPotentialTriplexSequencesintheHumanGenome-InSilicoStudyusingtheR/Bioconductor
PackageTriplex
87