ReHap: AN INTEGRATED SYSTEM FOR THE HAPLOTYPE ASSEMBLY PROBLEM FROM SHOTGUN SEQUENCING DATA
Filippo Geraci, Marco Pellegrini
2010
Abstract
Single nucleotide polymorphism (SNP) is the most common form of DNA variation. The set of SNPs present in a chromosome (called the haplotype) is of interest in a wide area of applications in molecular biology and biomedicine. Personalized haplotyping of (portions of/all) the chromosomes of individuals is one of the most promising basic ingredients leading to effective personalized medicine (including diagnosis, and eventually therapy). Personalized haplotyping is getting now technically and economically feasible via steady progress in shotguns sequencing technologies (see e.g. the 1000 genomes project - A deep catalogue of human genetic variations). One key algorithmic problem in this process is to solve the haplotype assembly problem, (also known as the single individual haplotyping problem), which is the problem of reconstructing the two haplotype strings (paternal and maternal) using the large collection of short fragments produced by the PCR-based shotgun technology. Although many algorithms for this problem have been proposed in the literature there has been little progress on the task of comparing them on a common basis and on providing support for selecting the best algorithm for the type of fragments generated by a specific experiment. In this paper we present Re-Hap, an easy-to-use AJAX based web tool that provides a complete experimental environment for comparing five different assembly algorithms under a variety of parameters setting, taking as input user generated data and/or providing several fragment-generation simulation tools. This is the first published report of a comparison among five different haplotype assembly algorithms on a common data and algorithmic framework. This system can be used by researchers freely at the url: http://bioalgo.iit.cnr.it/rehap/.
References
- Bansal, V. and Bafna, V. (2008). Hapcut: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24(16):i153-159.
- Barrett, J. C., Fry, B., Maller, J., and Daly, M. J. (2005). Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21(2):263-265.
- Bonizzoni, P., Vedova, G. D., Dondi, R., and Li, J. (2003). The haplotyping problem: an overview of computational models and solutions. J. Comput. Sci. Technol., 18(6):675-688.
- Chen, Z., Fu, B., Schweller, R. T., Yang, B., Zhao, Z., and Zhu, B. (2008). Linear time probabilistic algorithms for the singular haplotype reconstruction problem from snp fragments. In Brazma, A., Miyano, S., and Akutsu, T., editors, APBC, volume 6, pages 333- 342. Imperial College Press.
- Cilibrasi, R., van Iersel, L., Kelk, S., and Tromp, J. (2007). On the complexity of the single individual SNP haplotyping problem. Algorithmica.
- Consortium, T. I. H. (2005). A haplotype map of the human genome. Nature, 437:1299-1320.
- Crawford, D. and Nickerson, D. (2005). Definition and clinical importance of haplotypes. Annu. Rev. Med., 56:303-320.
- Genovese, L., Geraci, F., and Pellegrini, M. (2008). Speedhap: An accurate heuristic for the single individual snp haplotyping problem with many gaps, high reading error rate and low coverage. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(4):492-502.
- Iles, M. M. (2008). What can genome-wide association studies tell us about the genetics of common disease? PLoS Genet, 4(2).
- Levy, S., Sutton, G., Ng, P., Feuk, L., Halpern, A., and et al. (2007). The diploid genome sequence of an individual human. PLoS Biology, 5(10).
- Li, L., Kim, J. H., and Waterman, M. S. (2003). Haplotype reconstruction from SNP alignment. In Proceedings of the seventh annual international conference on Computational molecular biology, pages 207-216. ACM Press.
- Lindsay, S. J., Bonfield, J. K., and Hurles, M. E. (2005). Shotgun haplotyping: a novel method for surveying allelic sequence variation. Nucl. Acids Res., 33(18).
- Mardis, E. R. (2006). Genome Biology, 7.
- McQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proc. of the 5th Berkeley Symposium on Mathematical Statistics and Probabilily, volume 1, pages 281-297. University of California Press.
- Morozova, O. and Marra, M. A. (2008). Applications of next-generation sequencing technologies in functional genomics. Genomics, 92(5):255 - 264.
- Myers, G. (1999). A dataset generator for whole genome shotgun sequencing. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 202-210. AAAI Press.
- Panconesi, A. and Sozio, M. (2004). Fast hare: A fast heuristic for single individual SNP haplotype reconstruction. In WABI, pages 266-277.
- Pop, M. (2004). Shotgun sequence assembly. Advances in Computers, 60:193-248.
- Rizzi, R., Bafna, V., Istrail, S., and Lancia, G. (2002). Practical algorithms and fixed-parameter tractability for the single individual SNP haplotyping problem. In Proceedings of the Second International Workshop on Algorithms in Bioinformatics, pages 29-43. SpringerVerlag.
- Schork, N. J., Murray, S. S., Frazer, K. A., and Topol, E. J. (2009). Common vs. rare allele hypotheses for complex diseases. Current Opinion in Genetics & Development, 19(3):212 - 219.
- von Bubnoff, A. (2008). Next-generation sequencing: The race is on. Cell, 132(5):721 - 723.
- Wang, J. and et al. (2008). The diploid genome sequence of an asian individual. Nature, 456:60-65.
- Wang, Y., Feng, E., and Wang, R. (2007). A clustering algorithm based on two distance functions for mec model. Computational Biology and Chemistry, 31(2):148- 150.
- Wheeler, D. and et al. (2008). The complete genome of an individual by massively parallel dna sequencing. Nature, 452:872-876.
- Zhao, Y., Xu, Y., Zhang, Q., and Chen, G. (2007). An overview of the haplotype problems and algorithms. Frontiers of Computer Science in China, 1(3):272- 282.
- Zhao, Y.-Y., Wu, L.-Y., Zhang, J.-H., Wang, R.-S., and Zhang, X.-S. (2005). Haplotype assembly from aligned weighted SNP fragments. Computational Biology and Chemistry, 29(4):281-287.
Paper Citation
in Harvard Style
Geraci F. and Pellegrini M. (2010). ReHap: AN INTEGRATED SYSTEM FOR THE HAPLOTYPE ASSEMBLY PROBLEM FROM SHOTGUN SEQUENCING DATA . In Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010) ISBN 978-989-674-019-1, pages 19-25. DOI: 10.5220/0002713400190025
in Bibtex Style
@conference{bioinformatics10,
author={Filippo Geraci and Marco Pellegrini},
title={ReHap: AN INTEGRATED SYSTEM FOR THE HAPLOTYPE ASSEMBLY PROBLEM FROM SHOTGUN SEQUENCING DATA},
booktitle={Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)},
year={2010},
pages={19-25},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002713400190025},
isbn={978-989-674-019-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)
TI - ReHap: AN INTEGRATED SYSTEM FOR THE HAPLOTYPE ASSEMBLY PROBLEM FROM SHOTGUN SEQUENCING DATA
SN - 978-989-674-019-1
AU - Geraci F.
AU - Pellegrini M.
PY - 2010
SP - 19
EP - 25
DO - 10.5220/0002713400190025