Fast and Accurate cDNA Mapping and Splice Site Identification

Michaël Vyverman, Dieter De Smedt, Yao-Cheng Lin, Lieven Sterck, Bernard De Baets, Veerle Fack, Peter Dawyndt

Abstract

Mapping and alignment of cDNA sequences containing splice sites is an algorithmically and computationally challenging task. Most recently developed spliced aligners are designed for mapping short reads and sacrifice sensitivity for increased performance. We present mesalina, a highly accurate spliced aligner, that can also be used to detect novel non-canonical splice sites and whose performance is more robust with respect to increasing read length. Mesalina utilizes the seed-extend strategy, combining fast retrieval of maximal exact matches with a sensitive sandwich dynamic programming algorithm. Preliminary results indicate that mesalina is accurate and very fast, especially for mapping longer reads. In particular, it is more than ten times faster than mappers with a comparable accuracy. Mesalina is available from https://github.ugent.be/ComputationalBiology/mesalina.

References

  1. Abouelhoda, M. (2007). A chaining algorithm for mapping cDNA sequences to multiple genomic sequences. In SPIRE07, 14th international conference on String Processing and Information Retrieval. Springer-Verlag.
  2. Abouelhoda, M., Kurtz, S., and Ohlebusch, E. (2004). Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms, 2:53-86.
  3. Au, K., Jiang, H., Lin, L., Xing, Y., and Wong, W. (2010). Detection of splice junctions from paired-end RNASeq data by SpliceMap.
  4. De Bona, F., Ossowski, S., Schneeberger, K., and Rätsch, G. (2008). Optimal spliced alignments of short sequence reads. BMC Bioinformatics, 9:i170-i180.
  5. Dobin, A., Davis, C., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T. (2013). STAR: ultrafast universal RNA-Seq aligner. Bioinformatics, 29:15-21.
  6. Garber, M., Grabherr, M., Guttman, M., and Trapnell, C. (2011). Computational methods for transcriptome annotation and quantification using RNA-Seq. Nature methods, 8:469-477.
  7. Hoffmann, S., Otto, C., Kurtz, S., Sharma, C., Khaitovich, P., Vogel, J., Stadler, P., and Hackermüller, J. (2009). Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Computational Biology, 9:e1000502.
  8. Huang, S., Zhang, J., Li, R., Zhang, W., He, Z., Lam, T., Peng, Z., and Yiu, S. (2011). SOAPsplice: genomewide ab initio detection of splice junctions from RNASeq data. Frontiers in genetics, 2.
  9. Biology, 14:R36. W. (2012). RNASeqReadSimulator: A Simple RNA-Seq Read Simulator.
  10. Liu, Y. and Schmidt, B. (2012). Long read alignment based on maximal exact match seeds. Bioinformatics, 28:i318-i324.
  11. Manber, U. and Myers, G. (1993). Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22:935-948.
  12. Roberts, R., Carneiro, M., and Schatz, M. (2013). The advantages of SMRT sequencing. Genome Biology, 14:405.
  13. Trapnell, C., Pachter, L., and Salzberg, S. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25:1105-1111.
  14. Vyverman, M., De Baets, B., Fack, V., and Dawyndt, P. (2012). Prospects and limitations of full-text index structures in genome analysis. Nucleic Acids Research, 40:6993-7015.
  15. Vyverman, M., De Baets, B., Fack, V., and Dawyndt, P. (2013). essaMEM: finding maximal exact matches using enhanced sparse suffix arrays. Bioinformatics, 29:802-804.
  16. Wang, K., Singh, D., Zeng, Z., Coleman, S., Huang, Y., Savich, G., He, X., Mieczkowski, P., Grimm, S., and Perou, C. (2010). MapSplice: accurate mapping of RNA-Seq reads for splice junction discovery. Nucleic Acids Research, 38:e178-e178.
  17. Wu, T. and Nacu, S. (2010). Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics, 26:873-881.
  18. Wu, T. and Watanabe, C. (2005). GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 21:1859-1875.
Download


Paper Citation


in Harvard Style

Vyverman M., De Smedt D., Lin Y., Sterck L., De Baets B., Fack V. and Dawyndt P. (2014). Fast and Accurate cDNA Mapping and Splice Site Identification . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014) ISBN 978-989-758-012-3, pages 233-238. DOI: 10.5220/0004903502330238


in Bibtex Style

@conference{bioinformatics14,
author={Michaël Vyverman and Dieter De Smedt and Yao-Cheng Lin and Lieven Sterck and Bernard De Baets and Veerle Fack and Peter Dawyndt},
title={Fast and Accurate cDNA Mapping and Splice Site Identification},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)},
year={2014},
pages={233-238},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004903502330238},
isbn={978-989-758-012-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)
TI - Fast and Accurate cDNA Mapping and Splice Site Identification
SN - 978-989-758-012-3
AU - Vyverman M.
AU - De Smedt D.
AU - Lin Y.
AU - Sterck L.
AU - De Baets B.
AU - Fack V.
AU - Dawyndt P.
PY - 2014
SP - 233
EP - 238
DO - 10.5220/0004903502330238