TRADING RUNNING TIME FOR MEMORY IN PHYLOGENETIC LIKELIHOOD COMPUTATIONS

Fernando Izquierdo-Carrasco, Julien Gagneur, Alexandros Stamatakis

2012

Abstract

The revolution in wet-lab sequencing techniques that has given rise to a plethora of whole-genome or wholetranscriptome sequencing projects, often targeting 50 up to 1000 species, poses new challenges for efficiently computing the phylogenetic likelihood function both for phylogenetic inference and statistical post-analysis purposes. The phylogenetic likelihood function as deployed in maximum likelihood and Bayesian inference programs consumes the vast majority of computational resources, that is, memory and CPU time. Here, we introduce and implement a novel, general, and versatile concept to trade additional computations for memory consumption in the likelihood function which exhibits a surprisingly small impact on overall execution times. When trading 50% of the required RAM for additional computations, the average execution time increase because of additional computations amounts to only 15%. We demonstrate that, for a phylogeny with n species only log(n)+2 memory space is required for computing the likelihood. This is a promising result given the exponential growth of molecular datasets.

References

  1. Barkan, E., Biham, E., and Shamir, A. (2006). Rigorous bounds on cryptanalytic time/memory tradeoffs. In In Advances in CryptologyCRYPTO 2006, volume 4117 of LNCS, pages 1-21. Springer-Verlag.
  2. Berger, S. and Stamatakis, A. (2010). Accuracy and performance of single versus double precision arithmetics for Maximum Likelihood Phylogeny Reconstruction. Springer Lecture Notes in Computer Science, 6068:270-279.
  3. Dri, P. and Galil, Z. (1984). A time-space tradeoff for language recognition. Theory of Computing Systems, 17:3-12. 10.1007/BF01744430.
  4. Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol., 17:368-376.
  5. Fletcher, W. and Yang, Z. (2009). INDELible: a flexible simulator of biological sequence evolution. Molecular biology and evolution, 26(8):1879-1888.
  6. Guindon, S., Dufayard, J., Lefort, V., Anisimova, M., Hordijk, W., and Gascuel, O. (2010). New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic biology, 59(3):307.
  7. Izquierdo-Carrasco, F. and Stamatakis, A. (2011). Computing the phylogenetic likelihood function out-of-core. In In Proceedings of the IPDPS-HiCOMB 2011.
  8. Martin Simonsen, T. M. and Pedersen, C. N. S. (2010). Building very large neighbour-joining trees. Proceedings of Bioinformatics 2010, to appear in Springer bioinformatics lecture notes.
  9. Stamatakis, A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics, 22(21):2688-2690.
  10. Stamatakis, A. (2011). Phylogenetic Search Algorithms for Maximum Likelihood, pages 547-577. John Wiley & Sons, Inc.
  11. Stamatakis, A. and Alachiotis, N. (2010). Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data. Bioinformatics, 26(12):i132.
  12. Stamatakis, A. and Ott, M. (2008). Exploiting fine-grained parallelism in the phylogenetic likelihood function with mpi, pthreads, and openmp: a performance study. Pattern Recognition in Bioinformatics, pages 424- 435.
  13. Vitter, J. S. (2008). Algorithms and data structures for external memory. Found. Trends Theor. Comput. Sci., 2(4):305-474.
  14. Wheeler, T. (2009). Large-scale neighbor-joining with ninja. In Salzberg, S. and Warnow, T., editors, Algorithms in Bioinformatics, volume 5724 of Lecture Notes in Computer Science, pages 375-389. Springer Berlin / Heidelberg.
  15. Xu, J. and Lipton, R. J. (2005). On fundamental tradeoffs between delay bounds and computational complexity in packet scheduling algorithms. IEEE/ACM Trans. Netw., 13:15-28.
  16. Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites. J. Mol. Evol., 39:306-314.
  17. Zwickl, D. (2006). Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets under the Maximum Likelihood Criterion. PhD thesis, University of Texas at Austin.
Download


Paper Citation


in Harvard Style

Izquierdo-Carrasco F., Gagneur J. and Stamatakis A. (2012). TRADING RUNNING TIME FOR MEMORY IN PHYLOGENETIC LIKELIHOOD COMPUTATIONS . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012) ISBN 978-989-8425-90-4, pages 86-95. DOI: 10.5220/0003765600860095


in Bibtex Style

@conference{bioinformatics12,
author={Fernando Izquierdo-Carrasco and Julien Gagneur and Alexandros Stamatakis},
title={TRADING RUNNING TIME FOR MEMORY IN PHYLOGENETIC LIKELIHOOD COMPUTATIONS},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)},
year={2012},
pages={86-95},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003765600860095},
isbn={978-989-8425-90-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)
TI - TRADING RUNNING TIME FOR MEMORY IN PHYLOGENETIC LIKELIHOOD COMPUTATIONS
SN - 978-989-8425-90-4
AU - Izquierdo-Carrasco F.
AU - Gagneur J.
AU - Stamatakis A.
PY - 2012
SP - 86
EP - 95
DO - 10.5220/0003765600860095