The neighbour-joining method by Saitou and Nei is a widely used method for phylogenetic reconstruction, made popular by a combination of computational efficiency and reasonable accuracy. With its cubic running time by Studier and Kepler, the method scales to hundreds of species, and while it is usually possible to infer phylogenies with thousands of species, tens or hundreds of thousands of species is infeasible. Recently we developed a simple branch and bound heuristic, RapidNJ, which significantly reduces the average running time. However, the O(n2) space consumption of the RapidNJ method, and the NJ method in general, becomes a problem when inferring phylogenies with 10000+ taxa. In this paper we present two extentions of RapidNJ which reduce memory requirements and enable RapidNJ to infer very large phylogenetic trees efficiently. We also present an improved search heuristic for RapidNJ which improves RapidNJ's performance on many data sets of all sizes.
Bioinformatics 2010: Proceedings of the First International Conference on Bioinformatics (part of the 3rd International Joint Conference on Biomedical Engineering Systems and Technologies, Biostec 2010), 2010