Authors:
Matej Lexa
1
;
Tomáš Martínek
2
and
Marie Brázdová
3
Affiliations:
1
Masaryk University, Czech Republic
;
2
Brno Technical University, Czech Republic
;
3
Czech Academy of Sciences, Czech Republic
Keyword(s):
Human Genome, DNA Sequence, Non-B-DNA, Triplex, Bioconductor, Repetitive Sequences, Mobile DNA, Lexicographically Minimal Rotation.
Related
Ontology
Subjects/Areas/Topics:
Bioinformatics
;
Biomedical Engineering
;
Sequence Analysis
Abstract:
Eukaryotic genomes are rich in sequences capable of forming non-B DNA structures. These structures are
expected to play important roles in natural regulatory processes at levels above those of individual genes,
such as whole genome dynamics or chromatin organization, as well as in processes leading to the loss of
these functions, such as cancer development. Recently, a number of authors have mapped the occurrence of
potential quadruplex sequences in the human genome and found them to be associated with promoters. In
this paper, we set out to map the distribution and characteristics of potential triplex-forming sequences (PTS)
in the human genome sequence. Using the R/Bioconductor package triplex, we found these sequences to be
excluded from exons, while present mostly in a small number of repetitive sequence classes, especially short
sequence tandem repeats (microsatellites), Alu and combined elements, such as SVA. We also introduce a
novel way of classifying potential triplex sequen
ces, using a lexicographically minimal rotation of the most
frequent k-mer to assign class membership automatically. Members of such classes typically have different
propensities to form parallel and antiparallel intramolecular triplexes (H-DNA). We observed an interesting
pattern, where the predicted third strands of antiparallel H-DNA were much less likely to contain a deletion
than their duplex structural counterpart than were their parallel versions.
(More)