functionality, and is available directly to the
application developer or may be accessed by the
user via the included Sequence Assembler
demonstration application (Fig. 6). The Sequence
Assembler application is a GUI-based interface to a
range of MBF functions and uses rich user interface
elements to enable visualization and manipulation of
genomic data. The user can perform assembly,
alignment and multiple sequence alignment of DNA,
RNA and protein sequences, visualizing the output
in a graphical alignment display built using the
Windows Presentation Foundation and Silverlight.
The Sequence Assembler also provides a connector
to various BLAST (Altschul, 1997) web services,
which can be used to characterize an assembled
sequence using public databases.
While our initial results are promising, some work is
needed to further improve the quality and utility of
the assembled output, especially for large size
genomes. Nonetheless, PadeNA can currently be
used for assembling bacterial genomes on shared
memory architectures and each step can be
customized to handle datasets with different
characteristics, or better meet the needs of different
groups of scientific users.
ACKNOWLEDGEMENTS
We would like to thank the Aditi-Microsoft MBF
Engineering team for their continued support to
make this de novo assembler design and technical
implementation deep, robust and of very high
quality. We would also like to thank Steve Jones,
Inanc Birol and other staff at Canada’s Michael
Smith Genome Center for their kind assistance in
understanding the field of genomics. Last but not
least, a very special thanks to Prasanth Koorma for
his constant motivation and encouragement
throughout the project.
REFERENCES
Altschul Stephen F., Madden Thomas L., Schaffer
Alejandro A., Zhang Jinghui, Zhang Zheng, Miller
Webb, & Lipman David J. 1997,’ Gapped BLAST and
PSI-BLAST: a new generation of protein database
search programs’, Nucleic Acids Res. 25:3389-3402.
Batzoglou S., Jaffe D.B., Stanley K., Butler J., Gnerre S.,
Mauceli E., Berger B., Mesirov J. P., & Lander E. S.,
2002, ‘ARACHNE: a whole-genome shotgun
assembler’, Genome Research, 12:177–189.
Biswas Surupa 2006, The Performance Benefits of NGen.,
Viewed July 5
th
2010, < http://msdn. microsoft.com/
en-us/magazine/cc163610.aspx>
Butler J., MacCallum I., Kleber M., Shlyakhter I. A.,
Belmonte M. K., Lander E. S., Nusbaum C. N., &
Jaffe D. B., 2008, ‘ALLPATHS: De novo assembly of
whole-genome shotgun microreads’, Genome
Research, 18:810–820.
Chaisson M.J. & Pevzner P.A., 2008, ‘Short fragment
assembly of bacterial genomes’, Genome Research,
pages 18:324–330.
De Novo Assembly using Illumina reads – technical note:
Illumina sequencing, 2009, retrieved July 5
th
2010,
<http://www.illumina .com/Documents/products/tech
notes/technote_denovo_assembly.pdf>
Green P., 1996, ‘Documentation for Phrap. Technical
report’ Genome Center, University of Washington.
Havlak P., Chen R., Durbin K. J., Egan A., & Ren Y.,
2003, ‘The atlas genome assembly system’, Genome
Research, 14:721–731.
Huang X. & Madan A., 1999, ‘CAP3: A whole-genome
assembly program’, Genome Research, 9:868–877.
Huson Daniel H., Reinert Knut, & Myers Eugene W.,
2002, ‘The greedy path-merging algorithm for contig
scaffolding’, Journal of the ACM (JACM) archive,
Volume 49, Issue 5.
Kurtz S., Phillippy A., Delcher A. L., Smoot M.,
Shumway M., Antonescu C., & Salzberg S. L., 2004,
‘Versatile and open software for comparing large
genomes’, Genome Biology.
Mono: Cross platform, open source .NET development
framework, 2004. Viewed July 5
th
2010, <
http://mono-project.com/Main_Page>
Myers E. W., Sutton G. G., Delcher A. L., & Dew I. M.,
2000, ‘A whole-genome assembly of Drosophila’,
Science, 287(5461):2196–2204.
Pattison Ted 1999, Understanding Interface-based
Programming, Viewed July 5
th
2010, <
http://msdn.microsoft.com/en-us/library/aa 260635
(VS.60).aspx>
Pevzner P. A., Tang H., & Waterman M. S., 2001, ‘An
eulerian path approach to DNA fragment assembly’,
Proceedings of the National Academy of Sciences,
98(17):9748–9753.
Pop M., Kosack D. S., & Salzberg S. L., 2004,
‘Hierarchical scaffolding with Bambus’, Genome
Research, 14 (1), pp. 149-159.
Simpson J. T., Wong K., Jackman S. D., Schein J. E.,
Jones S. J., & Birol I., 2009, ‘ABySS: A parallel
assembler for short read sequence data’, Genome
Research.
Sutton G. G., White O., Adams M. D., & Kerlavage A. R.,
1995, ‘TIGR assembler: A new tool for assembling
large shotgun sequencing projects’, Genome Science
and Technology, 1:9–19.
Zerbino D. & Birney E., 2008. ‘Velvet: Algorithms for de
novo short read assembly using de Bruijn graphs’,
Genome Research, 18:821–829
PadeNA: A PARALLEL DE NOVO ASSEMBLER
203