De Novo Short Read Assembly Algorithm with Low Memory Usage

Yuki Endo, Fubito Toyama, Chikafumi Chiba, Hiroshi Mori, Kenji Shoji

2014

Abstract

Determining whole genome sequences of various species has many applications not only in biological system, but also in medicine, pharmacy and agriculture. In recent years, the emergence of high-throughput next generation sequencing technologies has dramatically reduced time and costs for whole genome sequencing. These new technologies provide ultrahigh throughput with lower unit data cost. However, the data are very short length fragments of DNA. Thus, developing algorithms for merging these fragments is very important. Merging these fragments without reference data is called de novo assembly. Many algorithms for de novo assembly have been proposed in recent years. Velvet, one of the algorithms, is famous because it has good performance in terms of memory and time consumption. But memory consumption increases dramatically when the size of input fragments is huge. Therefore, it is necessary to develop algorithm with low memory usage. In this paper, we propose an algorithm for de novo assembly with lower memory. In our experiments using E.coli K-12 strain MG 1655, memory consumption of the proposed algorithm was one-third of that of Velvet.

References

  1. Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I. A., Belmonte, M. K., Lander, E. S., Nusbaum, C., and Jaffe, D. B. (2008). ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res., 18(5):810-820.
  2. Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A. J., Muller, W. E., Wetter, T., and Suhai, S. (2004). Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res., 14(6):1147-1159.
  3. Hernandez, D., Francois, P., Farinelli, L., Osteras, M., and Schrenzel, J. (2008). De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res., 18(5):802-809.
  4. Jeck, W. R., Reinhardt, J. A., Baltrus, D. A., Hickenbotham, M. T., Magrini, V., Mardis, E. R., Dangl, J. L., and Jones, C. D. (2007). Extending assembly of short DNA sequences to handle error. Bioinformatics, 23(21):2942-2944.
  5. Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H., Wang, J., and Wang, J. (2010). De novo assembly of human genomes with massively parallel short read sequencing. Genome Res., 20(2):265-272.
  6. Miller, J. R., Delcher, A. L., Koren, S., Venter, E., Walenz, B. P., Brownley, A., Johnson, J., Li, K., Mobarry, C., and Sutton, G. (2008). Aggressive assembly of pyrosequencing reads with mates. Bioinformatics, 24(24):2818-2824.
  7. Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J., and Birol, I. (2009). ABySS: a parallel assembler for short read sequence data. Genome Res., 19(6):1117-1123.
  8. Warren, R. L., Sutton, G. G., Jones, S. J., and Holt, R. A. (2007). Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23(4):500-501.
  9. Zerbino, D. R. and Birney, E. (2008). Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18(5):821-829.
Download


Paper Citation


in Harvard Style

Endo Y., Toyama F., Chiba C., Mori H. and Shoji K. (2014). De Novo Short Read Assembly Algorithm with Low Memory Usage . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014) ISBN 978-989-758-012-3, pages 215-220. DOI: 10.5220/0004881002150220


in Bibtex Style

@conference{bioinformatics14,
author={Yuki Endo and Fubito Toyama and Chikafumi Chiba and Hiroshi Mori and Kenji Shoji},
title={De Novo Short Read Assembly Algorithm with Low Memory Usage},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)},
year={2014},
pages={215-220},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004881002150220},
isbn={978-989-758-012-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)
TI - De Novo Short Read Assembly Algorithm with Low Memory Usage
SN - 978-989-758-012-3
AU - Endo Y.
AU - Toyama F.
AU - Chiba C.
AU - Mori H.
AU - Shoji K.
PY - 2014
SP - 215
EP - 220
DO - 10.5220/0004881002150220