solving branches and cycles is very simple in the pro-
posed method. Thus, the maximum length and N50
length would be better by modifying the path trac-
ing algorithm. On the other hand, the error rate of
the proposed method was lower than that of Velvet.
The genome coverage was slightly lower than that of
Velvet. These results indicate that there are not large
differences in the contig quantity from path tracing
algorithms.
Figure 10: Comparison of maximum memory consumption.
Figure 11: Comparison of running time.
Table 2: Comparison of quantity of contigs.
Maximum
length (bp)
N50
(bp)
# of
contigs
Total
(bp)
Proposed method 74,708 17,038 631 4,560,202
Velvet
81,421 22,870 754 4,544,229
Table 3: Comparison of precision of contigs.
Genome
covered (%)
Average
error rate (%)
Proposed method 99.59 5.71
Velvet
99.89 7.29
4 CONCLUSIONS
In this paper, we propose an algorithm for huge scale
de novo assembly with low memory usage. In our ex-
periments using E.coli K-12 strain MG 1655, the re-
sults showed that maximum memory consumption of
the proposed algorithm was one-third that of Velvet.
Furthermore, the running time of proposed method
was also faster than that of Velvet. These results
showed that the proposed method outperformed Vel-
vet for the memory and the running time. On the other
hand, contig quality obtained by the proposed method
was slightly worse than that of Velvet. To improve
the accuracy of the contigs, we need to modifying the
path tracing algorithm in the future works.
REFERENCES
Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I. A.,
Belmonte, M. K., Lander, E. S., Nusbaum, C., and
Jaffe, D. B. (2008). ALLPATHS: de novo assembly
of whole-genome shotgun microreads. Genome Res.,
18(5):810–820.
Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A. J.,
Muller, W. E., Wetter, T., and Suhai, S. (2004). Us-
ing the miraEST assembler for reliable and automated
mRNA transcript assembly and SNP detection in se-
quenced ESTs. Genome Res., 14(6):1147–1159.
Hernandez, D., Francois, P., Farinelli, L., Osteras, M., and
Schrenzel, J. (2008). De novo bacterial genome se-
quencing: millions of very short reads assembled on a
desktop computer. Genome Res., 18(5):802–809.
Jeck, W. R., Reinhardt, J. A., Baltrus, D. A., Hicken-
botham, M. T., Magrini, V., Mardis, E. R., Dangl,
J. L., and Jones, C. D. (2007). Extending assembly
of short DNA sequences to handle error. Bioinformat-
ics, 23(21):2942–2944.
Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li,
Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H.,
Wang, J., and Wang, J. (2010). De novo assembly
of human genomes with massively parallel short read
sequencing. Genome Res., 20(2):265–272.
Miller, J. R., Delcher, A. L., Koren, S., Venter, E., Walenz,
B. P., Brownley, A., Johnson, J., Li, K., Mobarry,
C., and Sutton, G. (2008). Aggressive assembly of
pyrosequencing reads with mates. Bioinformatics,
24(24):2818–2824.
Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E.,
Jones, S. J., and Birol, I. (2009). ABySS: a parallel
assembler for short read sequence data. Genome Res.,
19(6):1117–1123.
Warren, R. L., Sutton, G. G., Jones, S. J., and Holt, R. A.
(2007). Assembling millions of short DNA sequences
using SSAKE. Bioinformatics, 23(4):500–501.
Zerbino, D. R. and Birney, E. (2008). Velvet: algorithms for
de novo short read assembly using de Bruijn graphs.
Genome Res., 18(5):821–829.
BIOINFORMATICS2014-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms
220