# Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods

### Jaime Seguel

#### Abstract

This paper focuses on mathematical definitions and results that prove the correctness of a parallel algorithm for mapping assembly. The mathematical concepts and facts discussed here establish the reach and limitations of a combination of Smith-Waterman local alignment method and Hirschberg’s divide-and-conquer longest common subsequence determination method. The parallel algorithm, whose correctness is proved, is a general method that works best for solving the problem of the local alignment of a short and a very large sequence, such as an entire genome. The method is thus, suitable for mapping assembly, where millions of short sequence segments, the so-called reads, are aligned with a whole genome.

#### References

- Sanger R., Coulson A., 1975. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. In J. Ml. Biol. 94(3), 441-448.
- Weijia Soon W., Hariharan M., Snyder M., 2013. Highthroughput sequencing for biology and medicine. In Mol. Syst. Biology 9:640 doi:10.1038/msb.2012.61.
- Li R. et. al., 2010. De novo assembly of human genomes with massively parallel short read sequencing. In Genome Research, 20 (2), 265-272.
- Smith T., Waterman M., 1981. Identification of common molecular subsequences. In J. of Mol. Biol. 147, 195- 197.
- Altschul S., Madden T., Shaffer A., Zhang J., Zhang Z., Miller W., Lipman D., 1997. Gapped BLAST and PSIBLAST: A new generation of protein database search programs. In Nuc. Ac. Res. 25 (17), 3389-3402.
- Delcher A., Phillippy A., Carlton J., Salzberg S., 2002. Fast algorithms for large-scale genome alignment comparison. In Nuc. Ac. Res., 30(11), 2478-2483.
- Li H., Homer N., 2010. A survey of sequence alignment algorithms for next-generation sequencing. In Brief. Bioinform., 11(5) 473-483.
- Shpaer E., Robinson M., Yee D., Candlin J., Mines R., Hunkapiller T., 1996. Sensitivity and selectivity in protein similarity searches: a comparison of smithwaterman in hardware to blast and fasta. In Genomics, 38(2), 179-191.
- Rahman A., Pachter L., 2013. CGAL: computing genome assembly likelihoods. In Gen. Biol. 14:R8, doi:10.1186/gb-2013-14-1-r8.
- Hirschberg D., 1975. A linear space algorithm for computing maximal common subsequences. In Comm. ACM, 18(6), 341-343, 1975.

#### Paper Citation

#### in Harvard Style

Seguel J. (2014). **Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods** . In *Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)* ISBN 978-989-758-012-3, pages 221-226. DOI: 10.5220/0004883802210226

#### in Bibtex Style

@conference{bioinformatics14,

author={Jaime Seguel},

title={Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods},

booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)},

year={2014},

pages={221-226},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0004883802210226},

isbn={978-989-758-012-3},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)

TI - Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods

SN - 978-989-758-012-3

AU - Seguel J.

PY - 2014

SP - 221

EP - 226

DO - 10.5220/0004883802210226