Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods
Jaime Seguel
2014
Abstract
This paper focuses on mathematical definitions and results that prove the correctness of a parallel algorithm for mapping assembly. The mathematical concepts and facts discussed here establish the reach and limitations of a combination of Smith-Waterman local alignment method and Hirschberg’s divide-and-conquer longest common subsequence determination method. The parallel algorithm, whose correctness is proved, is a general method that works best for solving the problem of the local alignment of a short and a very large sequence, such as an entire genome. The method is thus, suitable for mapping assembly, where millions of short sequence segments, the so-called reads, are aligned with a whole genome.
References
- Sanger R., Coulson A., 1975. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. In J. Ml. Biol. 94(3), 441-448.
- Weijia Soon W., Hariharan M., Snyder M., 2013. Highthroughput sequencing for biology and medicine. In Mol. Syst. Biology 9:640 doi:10.1038/msb.2012.61.
- Li R. et. al., 2010. De novo assembly of human genomes with massively parallel short read sequencing. In Genome Research, 20 (2), 265-272.
- Smith T., Waterman M., 1981. Identification of common molecular subsequences. In J. of Mol. Biol. 147, 195- 197.
- Altschul S., Madden T., Shaffer A., Zhang J., Zhang Z., Miller W., Lipman D., 1997. Gapped BLAST and PSIBLAST: A new generation of protein database search programs. In Nuc. Ac. Res. 25 (17), 3389-3402.
- Delcher A., Phillippy A., Carlton J., Salzberg S., 2002. Fast algorithms for large-scale genome alignment comparison. In Nuc. Ac. Res., 30(11), 2478-2483.
- Li H., Homer N., 2010. A survey of sequence alignment algorithms for next-generation sequencing. In Brief. Bioinform., 11(5) 473-483.
- Shpaer E., Robinson M., Yee D., Candlin J., Mines R., Hunkapiller T., 1996. Sensitivity and selectivity in protein similarity searches: a comparison of smithwaterman in hardware to blast and fasta. In Genomics, 38(2), 179-191.
- Rahman A., Pachter L., 2013. CGAL: computing genome assembly likelihoods. In Gen. Biol. 14:R8, doi:10.1186/gb-2013-14-1-r8.
- Hirschberg D., 1975. A linear space algorithm for computing maximal common subsequences. In Comm. ACM, 18(6), 341-343, 1975.
Paper Citation
in Harvard Style
Seguel J. (2014). Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014) ISBN 978-989-758-012-3, pages 221-226. DOI: 10.5220/0004883802210226
in Bibtex Style
@conference{bioinformatics14,
author={Jaime Seguel},
title={Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)},
year={2014},
pages={221-226},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004883802210226},
isbn={978-989-758-012-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)
TI - Mathematics of the Design of a Parallel Mapping Assembly Algorithm - Combining Smith-Waterman and Hirschberg’s LCS Methods
SN - 978-989-758-012-3
AU - Seguel J.
PY - 2014
SP - 221
EP - 226
DO - 10.5220/0004883802210226