Uneven Distribution of Potential Triplex Sequences in the Human Genome - In Silico Study using the R/Bioconductor Package Triplex

Matej Lexa, Tomáš Martínek, Marie Brázdová

Abstract

Eukaryotic genomes are rich in sequences capable of forming non-B DNA structures. These structures are expected to play important roles in natural regulatory processes at levels above those of individual genes, such as whole genome dynamics or chromatin organization, as well as in processes leading to the loss of these functions, such as cancer development. Recently, a number of authors have mapped the occurrence of potential quadruplex sequences in the human genome and found them to be associated with promoters. In this paper, we set out to map the distribution and characteristics of potential triplex-forming sequences (PTS) in the human genome sequence. Using the R/Bioconductor package triplex, we found these sequences to be excluded from exons, while present mostly in a small number of repetitive sequence classes, especially short sequence tandem repeats (microsatellites), Alu and combined elements, such as SVA. We also introduce a novel way of classifying potential triplex sequences, using a lexicographically minimal rotation of the most frequent k-mer to assign class membership automatically. Members of such classes typically have different propensities to form parallel and antiparallel intramolecular triplexes (H-DNA). We observed an interesting pattern, where the predicted third strands of antiparallel H-DNA were much less likely to contain a deletion than their duplex structural counterpart than were their parallel versions.

References

  1. Aboyoun, P., Pages, H., and Lawrence, M. (2013). Genomicranges: Representation and manipulation of genomic intervals. Technical Report R package version 1.10.7.
  2. Akman, S. A., Lingeman, R. G., Doroshow, J. H., and Smith, S. S. (1991). Quadruplex dna formation in a region of the trna gene supf associated with hydrogen peroxide mediated mutations. Biochemistry, 30(35):8648-8653.
  3. Arora, A., Dutkiewicz, M., and Scaria, V. (2008). Inhibition of translation in living eukaryotic cells by an rna gquadruplex motif. RNA, 14:1290-1296.
  4. Bacolla, A. and Wells, R. (2004). Non-b dna conformations, genomic rearrangements, and human disease. Journal of Biological Chemistry, 279:47411-47414.
  5. Bacolla, A., Wojciechowska, M., Kosmider, B., Larson, J. E., and Wells, R. D. (2006). The involvement of non-b dna structures in gross chromosomal rearrangements. DNA Repair, 5:1161-1170.
  6. Bailey, A. D., Pavelitz, T., and Weiner, A. M. (2013). The microsatellite sequence (ct)n.(ga)n promotes stable chromosomal integration of large tandem arrays of functional human u2 small nuclear rna genes. Molecular and Cellular Biology, 18(4):2262-2271.
  7. Bissler, J. J. (2007). Triplex dna and human disease. Frontiers in Bioscience, 12:4536-4546.
  8. Brereton, H., Firgaira, F., and Turner, D. (1993). Origins of polymorphism at a polypurine hypervariable locus. Nucleic Acids Research, 21(11):2563-2569.
  9. Buske, F. A., Bauer, D. C., Mattick, J. S., and Bailey, T. L. (2012). Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic. Genome Research, 22(7):1372-1381.
  10. Buske, F. A., Mattick, J. S., and Bailey, T. L. (2011). Potential in vivo roles of nucleic acid triple-helices. RNA Biology, 8(3):427-439.
  11. Cer, R. Z., Bruce, K. H., Mudunuri, U. S., Yi, M., Volfovsky, N., Luke, B. T., Bacolla, A., Collins, J. R., and Stephens, R. M. (2011). Non-b db: a database of predicted non-b dna-forming motifs in mammalian genomes. Nucleic Acids Research, 39(Database issue):D383-D391.
  12. Dewannieux, M., Esnault, C., and Heidmann, T. (2003). Line-mediated retrotransposition of marked alu sequences. Nature Genetics, 35:41-48.
  13. Dewannieux, M. and Heidmann, T. (2005). Role of poly(a) tail length in alu retrotransposition. Genomics, 86(3):378-381.
  14. Dixon, B., Lu, L., Chu, A., and Bissler, J. (2008). Recq and recg helicases have distinct roles in maintaining the stability of polypurine.polypyrimidine sequences. Mutation Research, 643:20-28.
  15. Durinck, S., Spellman, P., Birney, E., and Huber, W. (2009). Mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart. Nature Protocols, 4:1184-1191.
  16. Gowers, D. and Fox, K. (1998). Triple helix formation at (at)n adjacent to an oligopurine tract. Nucleic Acids Research, 26(16):3626-3633.
  17. Hon, J., Martinek, T., Rajdl, K., and Lexa, M. (2013). Triplex: an r/bioconductor package for identification and visualization of potential intramolecular triplex patterns in dna sequences. Bioinformatics, 29(15):1900-1901.
  18. Kanak, M., Alseiari, M., Balasubramanian, P., Addanki, K., Aggarwal, M., Noorali, S., Kalsum, A., Mahalingam, K., Pace, G., Panasik, N., and Bagasra, O. (2010). Triplex-forming micrornas form stable complexes with hiv-1 provirus and inhibit its replication. Applied Immunohistochemistry and Molecular Morphology, 18(6):532-545.
  19. Karolchik, D., Hinrichs, A. S., Furey, T. S., Roskin, K. M., Sugnet, C. W., Haussler, D., and Kent, W. J. (2004). Ucsc table browser data retrieval tool. Nucleic Acids Research, 32(Database issue):D493-D496.
  20. Kriegs, J. O., Churakov, G., Jurka, J., Brosius, J., and Schmitz, J. (2007). Evolutionary history of 7sl rnaderived sines in supraprimates. Trends in Genetics, 23(4):158-161.
  21. Lawrence, M., Huber, W., Pags, H., Aboyoun, P., Carlson, M., Gentleman, R., Morgan, M. T., and Carey, V. J. (2013). Software for computing and annotating genomic ranges. PLoS Computational Biology, 9(8):e1003118.
  22. Lexa, M., Kejnovsky, E., Steflova, P., Konvalinova, H., Vorlickova, M., and Vyskot, B. (2013). Quadruplexforming sequences occupy discrete regions inside plant ltr retrotransposons. Nucleic Acids Research, page 10.1093/nar/gkt893 (ePub).
  23. Lexa, M., Martinek, T., Burgetova, I., Kopecek, D., and Brazdova, M. (2011). A dynamic programming algorithm for identification of triplex-forming sequences. Bioinformatics, 27(18):2510-2517.
  24. Maizels, N. and Gray, L. (2013). The g4 genome. PLoS Genetics, 9(4):e1003468.
  25. Napierala, M., Dere, R., Vetcher, A. A., and Wells, R. D. (2004). Dna replication repair and recombination: Structure-dependent recombination hotspot activity of gaattc sequences from intron 1 of the friedreich's ataxia gene. The Journal of Biological Chemistry, 279:6444-6454.
  26. Pages, H. (2013). Bsgenome: Infrastructure for biostringsbased genome data packages. Technical Report R package version 1.26.1.
  27. Pages, H., Aboyoun, P., Gentleman, R., and DebRoy, S. (2013). Biostrings: String objects representing biological sequences, and matching algorithms. Technical Report R package version 2.26.3.
  28. Rich, A. and Zhang, S. (2008). Timeline: Z-dna: the long road to biological function. Nature Reviews Genetics, 4:566-572.
  29. Roy-Engel, A. M. (2012). A tale of an a-tail. the lifeline of a sine. Mobile Genetic Elements, 2(6):282-286.
  30. Sarkies, P., Murat, P., Phillips, L., Patel, K., Balasubramanian, S., and Sale, J. (2012). Fancj coordinates two pathways that maintain epigenetic stability at gquadruplex dna. Nucleic Acids Research, 40(4):1485- 1498.
  31. Savage, A. L., Bubb, V. J., Breen, G., and Quinn, J. P. (2013). Characterisation of the potential function of sva retrotransposons to modulate gene expression patterns. BMC Evolutionary Biology, 13(101).
  32. Schwab, R. A., Nieminuszczy, J., Shin-ya, K., and Niedzwiedz, W. (2013). Fancj lets chromatin stay true. Journal of Cell Biology, 201:33-48.
  33. Soyfer, V. and Potaman, V. (1995). Triple-helical nucleic acids. Springer-Verlag, Heidelberg.
  34. Westin, L., Blomquist, P., and Milligan, J. F. e. a. (1995). Triple helix dna alters nucleosomal histone-dna interactions and acts as a nucleosome barrier. Nucleic Acids Reserch, 23:2184-2191.
  35. Zhao, J., Bacolla, A., Wang, G., and Vasquez, K. (2010). Non-b dna structure-induced genetic instability and evolution. Cellular and Molecular Life Sciences, 67(1):43-62.
Download


Paper Citation


in Harvard Style

Lexa M., Martínek T. and Brázdová M. (2014). Uneven Distribution of Potential Triplex Sequences in the Human Genome - In Silico Study using the R/Bioconductor Package Triplex . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014) ISBN 978-989-758-012-3, pages 80-88. DOI: 10.5220/0004824100800088


in Bibtex Style

@conference{bioinformatics14,
author={Matej Lexa and Tomáš Martínek and Marie Brázdová},
title={Uneven Distribution of Potential Triplex Sequences in the Human Genome - In Silico Study using the R/Bioconductor Package Triplex},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)},
year={2014},
pages={80-88},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004824100800088},
isbn={978-989-758-012-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)
TI - Uneven Distribution of Potential Triplex Sequences in the Human Genome - In Silico Study using the R/Bioconductor Package Triplex
SN - 978-989-758-012-3
AU - Lexa M.
AU - Martínek T.
AU - Brázdová M.
PY - 2014
SP - 80
EP - 88
DO - 10.5220/0004824100800088