1 Department of Systems Biology, Technical University of Denmark2 Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark3 Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark4 Integrative Systems Biology, Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark5 South China University of Technology6 BGI-Shenzhen7 European Molecular Biology Laboratory8 L'Institut National de la Recherche Agronomique9 University of Copenhagen10 Behavioral Phenomics, Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark11 South China University of Technology12 European Molecular Biology Laboratory
Many analyses of the human gut microbiome depend on a catalog of reference genes. Existing catalogs for the human gut microbiome are based on samples from single cohorts or on reference genomes or protein sequences, which limits coverage of global microbiome diversity. Here we combined 249 newly sequenced samples of the Metagenomics of the Human Intestinal Tract (MetaHit) project with 1,018 previously sequenced samples to create a cohort from three continents that is at least threefold larger than cohorts used for previous gene catalogs. From this we established the integrated gene catalog (IGC) comprising 9,879,896 genes. The catalog includes close-to-complete sets of genes for most gut microbes, which are also of considerably higher quality than in previous catalogs. Analyses of a group of samples from Chinese and Danish individuals using the catalog revealed country-specific gut microbial signatures. This expanded catalog should facilitate quantitative characterization of metagenomic, metatranscriptomic and metaproteomic data from the gut microbiome to understand its variation across populations in human health and disease.
Nature Biotechnology, 2014, Vol 32, Issue 8, p. 834-841