While phylogenetic and cluster analyses are often used to define clonal groups within bacterial species, the identification of clonal groups that are associated with specific ecological niches or host species remains a challenge. We used Listeria monocytogenes, which causes invasive disease in humans and different animal species and which can be isolated from a number of environments including food, as a model organism to develop and implement a two-step statistical approach to the identification of phylogenetic clades that are significantly associated with different source populations, including humans, animals, and food. If the null hypothesis that the genetic distances for isolates within and between source populations are identical can be rejected (SourceCluster test), then particular clades in the phylogenetic tree with significant overrepresentation of sequences from a given source population are identified (TreeStats test). Analysis of sequence data for 120 L. monocytogenes isolates revealed evidence of clustering between isolates from the same source, based on the phylogenies inferred from actA and inlA (P = 0.02 and P = 0.07, respectively; SourceCluster test). Overall, the TreeStats test identified 10 clades with significant (P < 0.05) or marginally significant (P < 0.10) associations with defined sources, including human-, animal-, and food-associated clusters. Epidemiological and virulence phenotype data supported the fact that the source-associated clonal groups identified here are biologically valid. Overall, our data show that (i) the SourceCluster and TreeStats tests can identify biologically meaningful source-associated phylogenetic clusters and (ii) L. monocytogenes includes clonal groups that have adapted to infect specific host species or colonize nonhost environments.
Journal of Clinical Microbiology, 2006, Vol 44, Issue 10, p. 3742-3751