IMPROVEMENTS TO A MULTIPLE PROTEIN SEQUENCE ALIGNMENT TOOL

André Atanasio M. Almeida; Zanoni Dias

doi:10.5220/0003789202260233

IMPROVEMENTS TO A MULTIPLE PROTEIN SEQUENCE ALIGNMENT TOOL

André Atanasio M. Almeida, Zanoni Dias

2012

Abstract

Sequence alignment is the most common task in the bioinformatics field. It is a required method for the execution of a wide range of procedures such as the search for homologue sequences in a database or protein structure prediction. The main goal of the experiments in this work was to improve on the accuracy of the multiple sequence alignments. Our experiments concentrated on the MUMMALS multiple aligner, experimenting with three distinct modifications to the algorithm. Our first experiment was to modify the substring length of the k-mer count method. The second experiment we attempted was to substitute the commonly used Dayhoff(6) with alternative compressed alphabets. The third experiment was to modify the distance matrix computation and the guide tree construction. Each of the experiments showed a gain in result accuracy.

References

Almeida, A., Souza, M., and Dias, Z. (2010). Progressive multiple protein sequence alignment. In 6th International Symposium on Bioinformatics Research and Applications - Short Abstracts, pages 102-105, Storrs, CT, USA. http://www.cs.gsu.edu/isbra10/.
Altschul, S., Gish, W., Miller, W., Myers, E., and Lipman, D. (1990). Basic local alignment search tool. J Mol Biol, 215(3):403-410.
Dayhoff, M., Schwartz, R., and Orcutt, B. (1978). A model for evolutionary change in proteins. Atlas of Protein Sequence and Structure, 5(3):345-352.
Do, C., Mahabhashyam, M., Brudno, M., and Batzoglou, S. (2005). ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res., 15:330- 340.
Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological sequence analysis. Cambridge University Press, Cambridge, UK.
Edgar, R. (2004a). Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res., 32:380-385.
Edgar, R. (2004b). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32:1792-1797.
Felsenstein, J. (2011). PHYLIP home page. http://evolution.genetics.washington.edu/phylip.html.
Feng, D. and Doolittle, R. (1987). Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Biol., 25:351-360.
Henikoff, S. and Henikoff, J. (1992). Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci., 89(22):10915-10919.
Hogeweg, P. and Hesper, B. (1984). The alignment of sets of sequences and the construction of phyletic trees: An integrated method. J. Mol. Evol., 20:175-186.
Holm, L. and Sander, C. (1996). Mapping the protein universe. Science, 273:595-603.
Just, W. (2001). Computational complexity of multiple sequence alignment with SP-score. J Comput Biol, 8(6):615-623.
Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005). MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res, 33(2):511-518.
Lipman, D., Altschul, S., and Kececioglu, J. (1989). A tool for multiple sequence alignment. Proc. Natl. Acad. Sci., 86:4412-4415.
Murzin, A., Brenner, S., Hubbard, T., and Chothia, C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247:536-540.
Needleman, S. and Wunsch, C. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48:443- 453.
Notredame, C. (2002). Recent progress in multiple sequence alignment: a survey. Pharmacogenomics, 3:131-144.
Notredame, C., Higgins, D., and Heringa, J. (2000). TCOFFEE: A novel method for fast and accurate multiple sequence alignment. J Mol Biol, 302(1):205-217.
Pei, J. and Grishin, N. (2006). MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res, 34:4364-4374.
Pei, J., Sadreyev, R., and Grishin, N. (2003). PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics, 19:427-428.
Rost, B., Sander, C., and Schneider, R. (1994). PHD - An automatic server for protein secundary structure prediction. CABIOS, 10:53-60.
Saitou, N. and Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol., 4:406-425.
Sneath, P. and Sokal, R. (1973). Numerical Taxonomy. Freeman, San Francisco.
Thompson, J., Higgins, D., and Gibson, T. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucl. Acid. Res., 22:4673-4680.
Thompson, J., Koehl, P., Ripp, R., and Poch, O. (2005). BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins, 61:127- 136.
Thompson, J., Plewniak, F., and Poch, O. (1999). A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res., 27(13):2682- 2690.
Wallace, I., Blackshields, G., and Higgins, D. (2005). Multiple sequence alignments. Curr. Opin. Struct. Biol., 15:261-266.

Download

Paper Citation

in Harvard Style

Atanasio M. Almeida A. and Dias Z. (2012). IMPROVEMENTS TO A MULTIPLE PROTEIN SEQUENCE ALIGNMENT TOOL . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012) ISBN 978-989-8425-90-4, pages 226-233. DOI: 10.5220/0003789202260233

in Bibtex Style

@conference{bioinformatics12,
author={André Atanasio M. Almeida and Zanoni Dias},
title={IMPROVEMENTS TO A MULTIPLE PROTEIN SEQUENCE ALIGNMENT TOOL},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)},
year={2012},
pages={226-233},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003789202260233},
isbn={978-989-8425-90-4},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)
TI - IMPROVEMENTS TO A MULTIPLE PROTEIN SEQUENCE ALIGNMENT TOOL
SN - 978-989-8425-90-4
AU - Atanasio M. Almeida A.
AU - Dias Z.
PY - 2012
SP - 226
EP - 233
DO - 10.5220/0003789202260233