PMSGA: A FAST DNA FRAGMENT ASSEMBLER

Juho Mäkinen, Jorma Tarhio, Sami Khuri

Abstract

The DNA fragment assembly is an essential step in DNA sequencing projects. Since DNA sequencers output fragments, the original genome must be reconstructed from these small reads. In this paper, a new fragment assembly algorithm, Pattern Matching based String Graph Assembler (PMSGA), is presented. The algorithm uses multipattern matching to detect overlaps and a minimum cost flow algorithm to detect repeats. Special care was taken to reduce the algorithm's run time without compromising the quality of the assembly. PMSGA was compared with well-known fragment assemblers. The algorithm is faster than other assemblers. PMSGA produced high quality assemblies with prokaryotic data sets. The results for eukaryotic data are comparable with other assemblers.

References

  1. Green, P. (1999). “Phrap Documentation”. Available: http://www.phrap.org/phredphrap/phrap.html Referenced June 2009.
  2. Bang-Jensen, J. & Gutin, G. (2001). Digraphs: Theory, Algorithms and Applications, Springer Verlag, 2001.
  3. Ewing, B. & Green, P. (1998). “Base-calling of automated sequencer traces using Phred. II. Error probabilities,” Genome Research, vol. 8, no. 3, pp. 186.
  4. Huang, X. & Madan, A. (1999). “CAP3: A DNA sequence assembly program”, Genome research, vol. 9, no. 9, pp. 868.
  5. Kececioglu, J. & Myers, E. (1995).“Combinatorial algorithms for DNA sequence assembly,” Algorithmica, vol. 13, no. 1, pp. 7-51.
  6. Kim, S. (1997). “A structured pattern matching approach to shotgun sequence assembly,” Ph.D Dissertation, Computer Science Department, The University of Iowa, Iowa City.
  7. Salmela, L., Tarhio, J., & Kytöjoki, J. (2006). “Multipattern string matching with q-grams,” ACM Journal of Experimental Algorithmics, vol. 11, no. 1.
  8. Myers, E. W. (2005). “The fragment assembly string graph,” Bioinformatics, vol. 21, no. 2.
  9. Navarro, G. & Raffinot, M. (2000). “Fast and flexible string matching by combining bit-parallelism and suffix automata, ACM Journal of Experimental Algorithmics, vol. 5, no. 4.
  10. http://www.ncbi.nlm.nih.gov/Traces/assembly/ Referenced June 2009.
  11. Pevzner, P. A., Tang, H., & Waterman, M. S. (2001). “An Eulerian path approach to DNA fragment assembly,” Proc. Natl. Acad. Sci. USA, vol. 98, no. 17, pp. 9748- 9753.
  12. Schmid, R., Schuster, S. C., Steel, M. A., & Huson, D. H. (2008). “ReadSim - A simulator for Sanger and 454 sequencing,” in preparation, software freely available from www-ab.informatik.unituebingen.de/software/readsim Referenced June 2009.
  13. Waterston, R., Lander, E., & Sulston, J. (2003). “More on the sequencing of the human genome,” PNAS, vol. 100, no. 6, pp. 3022-3024.
Download


Paper Citation


in Harvard Style

Mäkinen J., Tarhio J. and Khuri S. (2010). PMSGA: A FAST DNA FRAGMENT ASSEMBLER . In Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010) ISBN 978-989-674-019-1, pages 77-82. DOI: 10.5220/0002580800770082


in Bibtex Style

@conference{bioinformatics10,
author={Juho Mäkinen and Jorma Tarhio and Sami Khuri},
title={PMSGA: A FAST DNA FRAGMENT ASSEMBLER},
booktitle={Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)},
year={2010},
pages={77-82},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002580800770082},
isbn={978-989-674-019-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)
TI - PMSGA: A FAST DNA FRAGMENT ASSEMBLER
SN - 978-989-674-019-1
AU - Mäkinen J.
AU - Tarhio J.
AU - Khuri S.
PY - 2010
SP - 77
EP - 82
DO - 10.5220/0002580800770082