De Novo Short Read Assembly Algorithm with Low Memory Usage
Yuki Endo, Fubito Toyama, Chikafumi Chiba, Hiroshi Mori, Kenji Shoji
2014
Abstract
Determining whole genome sequences of various species has many applications not only in biological system, but also in medicine, pharmacy and agriculture. In recent years, the emergence of high-throughput next generation sequencing technologies has dramatically reduced time and costs for whole genome sequencing. These new technologies provide ultrahigh throughput with lower unit data cost. However, the data are very short length fragments of DNA. Thus, developing algorithms for merging these fragments is very important. Merging these fragments without reference data is called de novo assembly. Many algorithms for de novo assembly have been proposed in recent years. Velvet, one of the algorithms, is famous because it has good performance in terms of memory and time consumption. But memory consumption increases dramatically when the size of input fragments is huge. Therefore, it is necessary to develop algorithm with low memory usage. In this paper, we propose an algorithm for de novo assembly with lower memory. In our experiments using E.coli K-12 strain MG 1655, memory consumption of the proposed algorithm was one-third of that of Velvet.
References
- Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I. A., Belmonte, M. K., Lander, E. S., Nusbaum, C., and Jaffe, D. B. (2008). ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res., 18(5):810-820.
- Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A. J., Muller, W. E., Wetter, T., and Suhai, S. (2004). Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res., 14(6):1147-1159.
- Hernandez, D., Francois, P., Farinelli, L., Osteras, M., and Schrenzel, J. (2008). De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res., 18(5):802-809.
- Jeck, W. R., Reinhardt, J. A., Baltrus, D. A., Hickenbotham, M. T., Magrini, V., Mardis, E. R., Dangl, J. L., and Jones, C. D. (2007). Extending assembly of short DNA sequences to handle error. Bioinformatics, 23(21):2942-2944.
- Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H., Wang, J., and Wang, J. (2010). De novo assembly of human genomes with massively parallel short read sequencing. Genome Res., 20(2):265-272.
- Miller, J. R., Delcher, A. L., Koren, S., Venter, E., Walenz, B. P., Brownley, A., Johnson, J., Li, K., Mobarry, C., and Sutton, G. (2008). Aggressive assembly of pyrosequencing reads with mates. Bioinformatics, 24(24):2818-2824.
- Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J., and Birol, I. (2009). ABySS: a parallel assembler for short read sequence data. Genome Res., 19(6):1117-1123.
- Warren, R. L., Sutton, G. G., Jones, S. J., and Holt, R. A. (2007). Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23(4):500-501.
- Zerbino, D. R. and Birney, E. (2008). Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18(5):821-829.
Paper Citation
in Harvard Style
Endo Y., Toyama F., Chiba C., Mori H. and Shoji K. (2014). De Novo Short Read Assembly Algorithm with Low Memory Usage . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014) ISBN 978-989-758-012-3, pages 215-220. DOI: 10.5220/0004881002150220
in Bibtex Style
@conference{bioinformatics14,
author={Yuki Endo and Fubito Toyama and Chikafumi Chiba and Hiroshi Mori and Kenji Shoji},
title={De Novo Short Read Assembly Algorithm with Low Memory Usage},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)},
year={2014},
pages={215-220},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004881002150220},
isbn={978-989-758-012-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)
TI - De Novo Short Read Assembly Algorithm with Low Memory Usage
SN - 978-989-758-012-3
AU - Endo Y.
AU - Toyama F.
AU - Chiba C.
AU - Mori H.
AU - Shoji K.
PY - 2014
SP - 215
EP - 220
DO - 10.5220/0004881002150220