Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods

Jaime Seguel

Abstract

This paper focuses on mathematical definitions and results that prove the correctness of a parallel algorithm for mapping assembly. The mathematical concepts and facts discussed here establish the reach and limitations of a combination of Smith-Waterman local alignment method and Hirschberg’s divide-and-conquer longest common subsequence determination method. The parallel algorithm, whose correctness is proved, is a general method that works best for solving the problem of the local alignment of a short and a very large sequence, such as an entire genome. The method is thus, suitable for mapping assembly, where millions of short sequence segments, the so-called reads, are aligned with a whole genome.

References

  1. Sanger R., Coulson A., 1975. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. In J. Ml. Biol. 94(3), 441-448.
  2. Weijia Soon W., Hariharan M., Snyder M., 2013. Highthroughput sequencing for biology and medicine. In Mol. Syst. Biology 9:640 doi:10.1038/msb.2012.61.
  3. Li R. et. al., 2010. De novo assembly of human genomes with massively parallel short read sequencing. In Genome Research, 20 (2), 265-272.
  4. Smith T., Waterman M., 1981. Identification of common molecular subsequences. In J. of Mol. Biol. 147, 195- 197.
  5. Altschul S., Madden T., Shaffer A., Zhang J., Zhang Z., Miller W., Lipman D., 1997. Gapped BLAST and PSIBLAST: A new generation of protein database search programs. In Nuc. Ac. Res. 25 (17), 3389-3402.
  6. Delcher A., Phillippy A., Carlton J., Salzberg S., 2002. Fast algorithms for large-scale genome alignment comparison. In Nuc. Ac. Res., 30(11), 2478-2483.
  7. Li H., Homer N., 2010. A survey of sequence alignment algorithms for next-generation sequencing. In Brief. Bioinform., 11(5) 473-483.
  8. Shpaer E., Robinson M., Yee D., Candlin J., Mines R., Hunkapiller T., 1996. Sensitivity and selectivity in protein similarity searches: a comparison of smithwaterman in hardware to blast and fasta. In Genomics, 38(2), 179-191.
  9. Rahman A., Pachter L., 2013. CGAL: computing genome assembly likelihoods. In Gen. Biol. 14:R8, doi:10.1186/gb-2013-14-1-r8.
  10. Hirschberg D., 1975. A linear space algorithm for computing maximal common subsequences. In Comm. ACM, 18(6), 341-343, 1975.
Download


Paper Citation


in Harvard Style

Seguel J. (2014). Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014) ISBN 978-989-758-012-3, pages 221-226. DOI: 10.5220/0004883802210226


in Bibtex Style

@conference{bioinformatics14,
author={Jaime Seguel},
title={Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)},
year={2014},
pages={221-226},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004883802210226},
isbn={978-989-758-012-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)
TI - Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods
SN - 978-989-758-012-3
AU - Seguel J.
PY - 2014
SP - 221
EP - 226
DO - 10.5220/0004883802210226