PYCOEVOL - A Python Workflow to Study Protein-protein Coevolution

Fábio Madeira, Ludwig Krippahl

Abstract

Protein coevolution has emerged as an important research topic. Several methods and scoring systems were developed to quantify coevolution, though the quality of the results usually depends on the completeness of the biological data. To simplify the computation of coevolution indicators from the data, we have implemented a fully integrated and automated workflow which enables efficient analysis of protein coevolution, using the Python scripting language. Pycoevol automates access to remote or local databases and third-party applications, including also data processing functions. For a given protein complex under study, Pycoevol retrieves and processes all the information needed to undergo the analysis, namely homologous sequence search, multiple sequence alignment computation and coevolution analysis, using a Mutual Information indicator. In addition, friendly output results are created, namely histograms and heatmaps of inter-protein mutual information scores, as well as lists of significant coevolving residue pairs. An illustrative example is presented. Pycoevol is platform independent, and is available under the general public license from http://code.google.com/p/pycoevol.

References

  1. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389-402.
  2. Bernstein, H. J. (2000). Recent changes to RasMol, recombining the variants. Trends in Biochemical Sciences, 25(9), 453-5.
  3. Caporaso, J. G., Smit, S., Easton, B. C., Hunter, L., Huttley, G. A., Knight, R. (2008). Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics. BMC Evolutionary Biology, 8, 327-52.
  4. Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T. J., Higgins, D. G., Thompson, J. D. (2003). Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Research, 31(13), 3497-500.
  5. Cock, P. J., Antao, T., Chang, J. T., Chapman, B., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422-3.
  6. Do, C. B., Katoh, K. (2008). Protein Multiple Sequence Alignment. Functional Proteomics: Methods and Protocols, 484, 379-413.
  7. Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792-7.
  8. Elias, I. (2006). Settling the intractability of multiple alignment. Journal of Computational Biology, 13(7), 1323-39.
  9. Freyhult, E., Moulton, V., Gardner, P. (2005). Predicting RNA structure using mutual information. Applied Bioinformatics, 4(1), 53-59.
  10. Halperin, I., Wolfson, H., Nussinov, R. (2006). Correlated Mutations: Advances and Limitations. A Study on Fusion Proteins and on the Cohesin-Dockerin Families. Proteins: Structure, Function, and Bioinformatics, 63(4), 832-845.
  11. Hart, P. J., Deep, S., Taylor, A. B., Shu, Z., Hinck, C. S., Hinck, A. P. (2002). Crystal structure of the human TbetaR2 ectodomain--TGF-beta3 complex. Nature Structural Biology, 9(3), 203-8.
  12. Henikoff, S., Henikoff, J.G. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America, 89(22), 10915-9.
  13. Ho, B. K., Gruswitz, F. (2008). HOLLOW: generating accurate representations of channel and interior surfaces in molecular structures. BMC Structural Biology, 8, 49-55.
  14. Katoh, K., Misawa, K., Kuma, K., Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research, 30(14), 3059-66.
  15. Martin, L. C., Gloor, G. B., Dunn, S. D., Wahl, L. M. (2005). Using information theory to search for coevolving residues in proteins. Bioinformatics, 21(22), 4116-24.
  16. Notredame, C., Higgins, D. G., Heringa, J. (2000). TCoffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 302(1), 205-17.
  17. Palma, P. N., Krippahl, L., Wampler, J. E., Moura, J. J. (2000). BiGGER: a new (soft) docking algorithm for predicting protein interactions. Proteins, 39(4), 372- 84.
  18. Pazos, F., Helmer-Citterich, M., Ausiello, G., Valencia, A. (1997). Correlated mutations contain information about protein-protein interaction. Journal of Molecular Biology, 271(4), 511-23.
  19. Pazos, F.,Valencia, A. (2008). Protein co-evolution, coadaptation and interactions. The EMBO Journal, 27(20), 2648-55.
  20. Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., Ferrin, T. E. (2004). UCSF Chimera--a visualization system for exploratory research and analysis. Journal of Computional Chemistry, 25(13), 1605-13.
  21. Sanner, M. F., Olson, J. Spehner, J. C. (1996). Reduced surface: an efficient way to compute molecular surfaces. Biopolymers, 38(3), 305-20.
  22. Saraf, M., Moore, G., Maranas, C. (2003). Using multiple sequence correlation analysis to characterize functionally important protein regions. Protein Engineering, 16(6), 397-406.
  23. Süel, G. M., Lockless, S. W., Wall, M. A., Ranganathan, R. (2003). Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nature Structural Biology, 10(1), 59-69.
  24. Thompson, J. D., Plewniak, F., Poch, O. (1999). BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics, 15(1), pp.87-8.
  25. Thompson, J. D., Linard, B., Lecompte, O., Poch, O. (2011). A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives. PLoS ONE, 6(3), 18093-107.
  26. Yeang, C-H., Haussler, D. (2007). Detecting coevolution in and among protein domains. PLoS Computational Biology, 3(11), 2122-34.
Download


Paper Citation


in Harvard Style

Madeira F. and Krippahl L. (2012). PYCOEVOL - A Python Workflow to Study Protein-protein Coevolution . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012) ISBN 978-989-8425-90-4, pages 143-149. DOI: 10.5220/0003737901430149


in Bibtex Style

@conference{bioinformatics12,
author={Fábio Madeira and Ludwig Krippahl},
title={PYCOEVOL - A Python Workflow to Study Protein-protein Coevolution},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)},
year={2012},
pages={143-149},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003737901430149},
isbn={978-989-8425-90-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)
TI - PYCOEVOL - A Python Workflow to Study Protein-protein Coevolution
SN - 978-989-8425-90-4
AU - Madeira F.
AU - Krippahl L.
PY - 2012
SP - 143
EP - 149
DO - 10.5220/0003737901430149