BUILDING VERY LARGE NEIGHBOUR-JOINING TREES

Martin Simonsen, Thomas Mailund, Christian N. S. Pedersen

Abstract

The neighbour-joining method by Saitou and Nei is a widely used method for phylogenetic reconstruction, made popular by a combination of computational efficiency and reasonable accuracy. With its cubic running time by Studier and Kepler, the method scales to hundreds of species, and while it is usually possible to infer phylogenies with thousands of species, tens or hundreds of thousands of species is infeasible. Recently we developed a simple branch and bound heuristic, RapidNJ, which significantly reduces the average running time. However, the O(n^2) space consumption of the RapidNJ method, and the NJ method in general, becomes a problem when inferring phylogenies with 10000+ taxa. In this paper we present two extentions of RapidNJ which reduce memory requirements and enable RapidNJ to infer very large phylogenetic trees efficiently. We also present an improved search heuristic for RapidNJ which improves RapidNJ’s performance on many data sets of all sizes.

References

  1. Aggerwal, A. and Vitter, T. S. (1988). The input output complexity of sorting and related problems. In Communications of the ACM, volume 31(9), pages 1116- 1127.
  2. Alm, E. J., Huang, K. H., Price, M. N., Koche, R. P., Keller, K., Dubchak, I. L., and Arkin, A. P. (2005). The microbesonline web site for comparative genomics. Genome Research, 15(7):1015-1022.
  3. Elias, I. and Lagergren, J. (2005). Fast neighbour joining. In Proceedings of the 32nd International Colloquium on Automata, Languages and Programming (ICALP), volume 3580 of Lecture Notes in Computer Science, pages 1263-1274. Springer.
  4. Howe, K., Bateman, A., and Durbin, R. (2002). QuickTree: Building huge neighbour-joining trees of protein sequences. Bioinformatics, 18(11):1546-1547.
  5. Mailund, T., Brodal, G. S., Fagerberg, R., Pedersen, C. N. S., and Philips, D. (2006). Recrafting the neighborjoining method. BMC Bioinformatics, 7(29).
  6. Mailund, T. and Pedersen, C. N. S. (2004). QuickJoin - fast neighbour-joining tree reconstruction. Bioinformatics, 20:3261-3262.
  7. Ott, M., Zola, J., Stamatakis, A., and Aluru, S. (2007). Large-scale maximum likelihood-based phylogenetic analysis on the ibm bluegene/l. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pages 1-11.
  8. Price, M. N., Dehal, P. S., and Arkin, A. P. (2009). Fasttree: Computing large minimum-evolution trees with profiles instead of a distance matrix. Mol Biol Evol, 26(7):1641-1650.
  9. Saitou, N. and Nei, M. (1987). The Neighbor-Joining Method: a New Method for Reconstructing Phylogenetic Trees. Molecular Biology and Evolution, 4:406- 425.
  10. Sheneman, L., Evans, J., and Foster, J. A. (2006). Clearcut: A fast implementation of relaxed neighbor-joining. Bioinformatics, 22(22):2823-2824.
  11. Simonsen, M., Mailund, T., and Pedersen, C. N. S. (2008). Rapid neighbour-joining. In Algorithms in Bioinformatics, Proceedings 8th International Workshop, WABI 2008, volume 5251, pages 113-123.
  12. Stamatakis, A. (2006). Raxml-vi-hpc: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Oxford Journals, 22(21):2688-2690.
  13. Studier, J. A. and Kepler, K. J. (1988). A note on the neighbour-joining method of Saitou and Nei. Molecular Biology and Evolution, 5:729-731.
  14. Wheeler, T. J. (2009). Large-scale neighbor-joining with ninja. In Algorithms in Bioinformatics, Proceedings 9th International Workshop, WABI 2009, volume 5724/2009, pages 375-389.
Download


Paper Citation


in Harvard Style

Simonsen M., Mailund T. and N. S. Pedersen C. (2010). BUILDING VERY LARGE NEIGHBOUR-JOINING TREES . In Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010) ISBN 978-989-674-019-1, pages 26-32. DOI: 10.5220/0002715700260032


in Bibtex Style

@conference{bioinformatics10,
author={Martin Simonsen and Thomas Mailund and Christian N. S. Pedersen},
title={BUILDING VERY LARGE NEIGHBOUR-JOINING TREES},
booktitle={Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)},
year={2010},
pages={26-32},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002715700260032},
isbn={978-989-674-019-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Bioinformatics - Volume 1: BIOINFORMATICS, (BIOSTEC 2010)
TI - BUILDING VERY LARGE NEIGHBOUR-JOINING TREES
SN - 978-989-674-019-1
AU - Simonsen M.
AU - Mailund T.
AU - N. S. Pedersen C.
PY - 2010
SP - 26
EP - 32
DO - 10.5220/0002715700260032