1 Department of Systems Biology, Technical University of Denmark2 Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark3 Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark4 Center for Biological sequence analysis, Technical University of Denmark5 Comparative Microbial Genomics, Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark6 National Food Institute, Technical University of Denmark7 Division of Epidemiology and Microbial Genomics, National Food Institute, Technical University of Denmark8 Behavioral Phenomics, Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark9 L'Institut National de la Recherche Agronomique10 Department of Bio and Health Informatics, Technical University of Denmark11 South China University of Technology12 European Molecular Biology Laboratory13 Metagenomics, Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark14 Centre National de la Recherche Scientifique15 University of Southern Denmark16 University Hospital Vall d’Hebron17 University of Copenhagen18 Center for Energy Resources Engineering, Center, Technical University of Denmark19 Vrije Universiteit Brussel20 Beijing Genomics Institute Hong Kong21 Wageningen IMARES22 Tokyo Institute of Technology23 South China University of Technology24 European Molecular Biology Laboratory25 Vrije Universiteit Brussel26 Tokyo Institute of Technology
Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.
Nature Biotechnology, 2014, Vol 32, Issue 8, p. 822-828
SHORT READ ALIGNMENT SEQUENCES SYSTEMS ALGORITHMS MICROBIOTA PROTEIN LIFE SETS TREE TOOL