We also observe that, in case of MOBS almost all
of the suffix-prefix overlaps among remaining assem-
blies are true overlaps. By true overlap we mean that
these overlaps are present among assemblies, when
the assemblies are aligned against genome. Thus
even if MOBS may not report the best performance
on based on length of the assemblies, the suffix-prefix
overlaps among assemblies can be used to generate
bigger assemblies.
While MOBS runs reasonably fast, time compari-
son is not very meaningful as all the other assemblers
that report faster times seem to be multi-threaded.
MOBS at present has a single threaded implementa-
tion.
4 FUTURE WORK AND
CONCLUSIONS
In this paper, we presented a method to generate as-
semblies from short reads using only short length
overlaps. This approach produces comparable results
while reducing the computational effort. There are
many possibilities for further improvement of results
using this approach. Generating assemblies that are
not contained in others is one. Developing algorithms
that generate larger assemblies is another and how do
we need to modify our algorithm to handle challenges
in real data such as error in reads and reads from both
strands of genome.
Comparisons given here are only indicative of the
promise of the approach and should not be taken as
the final word as some of the assemblers, used in the
comparison, do not give an option to set the error
model. We are working to extend this technique and
a full and final version will have its results on the real
data.
ACKNOWLEDGEMENT
This work is a part of the ongoing research program
on de novo genome assembly of Prof. S.N. Mahesh-
wari at IIT Delhi.We thank Prof. Maheshwari for his
guidance and support. We are also grateful to Prof.
Sanjiva Prasad for useful discussions. This work
has been partly supported from his project “Founda-
tions of Trusted and Scalable ’Last-Mile’ Healthcare”
funded by DeitY, Government of India.
REFERENCES
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A.,
Dvorkin, M., Kulikov, A. S., Lesin, V. M., Nikolenko,
S. I., Pham, S., Prjibelski, A. D., et al. (2012). Spades:
a new genome assembly algorithm and its applications
to single-cell sequencing. Journal of Computational
Biology, 19(5):455–477.
Chaisson, M. J. P., Brinza, D., and Pevzner, P. A. (2008).
De novo fragment assembly with short mate-paired
reads: Does the read length matter? Genome Re-
search, 19(2):336–346.
Gnerre, S., Maccallum, I., Przybylski, D., Ribeiro, F. J.,
Burton, J. N., Walker, B. J., Sharpe, T., Hall, G., Shea,
T. P., Sykes, S., Berlin, A. M., Aird, D., Costello, M.,
Daza, R., Williams, L., Nicol, R., Gnirke, A., Nus-
baum, C., Lander, E. S., and Jaffe, D. B. (2011). High-
quality draft assemblies of mammalian genomes from
massively parallel sequence data. Proceedings of the
National Academy of Sciences of the United States of
America, 108(4):1513–1518.
Gonnella, G. and Kurtz, S. (2012). Readjoiner: a fast and
memory efficient string graph-based sequence assem-
bler. BMC bioinformatics, 13(1):82.
Gusfield, D., Landau, G. M., and Schieber, B. (1992). An
efficient algorithm for the all pairs suffix-prefix prob-
lem. Information Processing Letters, 41(4):181 – 185.
Hernandez, D., Franc¸ois, P., Farinelli, L., Øster
˚
as, M., and
Schrenzel, J. (2008). De novo bacterial genome se-
quencing: millions of very short reads assembled on a
desktop computer. Genome research, 18(5):802–809.
Huang, S., Li, R., Zhang, Z., Li, L., Gu, X., Fan, W., Lucas,
W. J., Wang, X., Xie, B., Ni, P., et al. (2009). The
genome of the cucumber, cucumis sativus l. Nature
genetics, 41(12):1275–1281.
Huang, X. and Madan, A. (1999). Cap3: A dna sequence
assembly program. Genome research, 9(9):868–877.
Huang, X., Wang, J., Aluru, S., Yang, S.-P., and Hillier,
L. (2003). Pcap: a whole-genome assembly program.
Genome research, 13(9):2164–2170.
Idury, R. M. and Waterman, M. S. (1995). A new algo-
rithm for DNA sequence assembly. Journal of com-
putational biology, 2(2):291–306.
Li, R., Fan, W., Tian, G., Zhu, H., He, L., Cai, J., Huang, Q.,
Cai, Q., Li, B., Bai, Y., et al. (2009). The sequence and
de novo assembly of the giant panda genome. Nature,
463(7279):311–317.
Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li,
Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H.,
Wang, J., and Wang, J. (2010). De novo assembly
of human genomes with massively parallel short read
sequencing. Genome Research, 20(2):265–272.
Mullikin, J. C. and Ning, Z. (2003). The phusion assembler.
Genome research, 13(1):81–90.
Myers, E. W. (2005). The fragment assembly string graph.
Bioinformatics, 21(suppl 2):ii79–ii85.
Myers, E. W., Sutton, G. G., Delcher, A. L., Dew, I. M., Fa-
sulo, D. P., Flanigan, M. J., Kravitz, S. A., Mobarry,
C. M., Reinert, K. H., Remington, K. A., et al. (2000).
De-NovoAssemblyofShortReadsinMinimalOverlapModel
53