摘要
Nonhuman primates (NHPs) such as monkeys are the closest living relatives to humans and are the best available models for causative studies of human health and diseases. Gut microbiomes are intensively involved in host health. In this study, by large-scale cultivation of microbes from fecal samples of monkeys, we obtained previously uncultured bacterial species and constructed a Macaca fascicularis Gut Microbial Biobank (MfGMB). The MfGMB consisted of 250 strains that represent 97 species of 63 genera, 25 families, and 4 phyla. The information of the 250 strains and the genomes of 97 cultured species are publicly accessible. The MfGMB represented nearly 50% of core gut microbial compositions at the genus level and covered over 80% of the KO-based known gut microbiome functions of M. fascicularis. Data mining showed that the bacterial species in the MfGMB were prevalent not only in NHPs gut microbiomes but also in human gut microbiomes. This study will help the understanding and future investigations on how gut microbiomes interact with their mammalian hosts. Gut microbiomes (GMs) contribute to human health and diseases1. Nonhuman primates (NHPs) are the evolutionary-closest living relatives and share high similarities in genetics, anatomy, physiology, and behavioristics to human beings2. Thus, the NHPs such as Macaca fascicularis are ideal models for the study of human diseases with intricate pathogenesis and phenotypes that would be hard to be replicated in other mammalian animals. For instance, many studies concerning the development and treatment of mental disease, for example, autism disease and Parkinson's disease, were performed using NHPs, and the experimental results were more reliable to be extrapolated to humans3, 4. Although the taxonomic composition and strain colonization of gut microbiota usually exert host preference5-7, previous results revealed that captivity humanizes the NHPs gut microbiota8, 9. In recent years, species of the genera Bifidobacterium10, Helicobacter11, Aeromonas12, Lactobacillus13, Limosilactobacillus gorillae14, and Alloscardovia theropitheci15 were isolated from NHPs. Although the microbial ecology of the gastrointestinal tract of the rhesus monkey (Macaca mulatta) has been studied early in 1971,16 less efforts were made on the cultivation and collection of gut microbial resources from NHPs12, 13, 15 compared with human5, 6, 17-19, mouse7, 20, and pig21. According to the analysis of six available NHPs metagenomic cohorts, many gut microbial species had not been cultivated and were unexplored due to the shortage of reference genomes and the lack of cultured bioresources22. Thus, an extensive collection of cultivable gut microbes from NHPs is of practical importance and would (1) facilitate causative studies of host–microbe interactions, (2) develop new interventions for GMs dysbiosis, and (3) promote the in-depth comparison of human and NHPs GMs. In this study, we constructed a Macaca fascicularis Gut Microbial Biobank (MfGMB; homepage: https://nmdc.cn/mfgmb/ and http://www.cgmcc.net/english/mfgmb/) that consisted of 250 strains representing 97 different species of 63 genera, 25 families, and 4 phyla. MfGMB harbored 32 novel species that were characterized and denominated by following the rules of the International Code of Nomenclature of Prokaryotes. Based on taxonomic studies, 13 novel genera and 1 novel family were proposed to accommodate the new bacterial species. In silico analysis revealed that the newly characterized bacterial taxa were prevalent in both monkey and human guts, and the MfGMB genomes covered over 80.0% of the known function (KEGG Orthologs) of M. fascicularis gut global gene catalog22. The construction of MfGMB was initiated by large-scale cultivation of gut microbes, and totally, 73 culture conditions (Materials and Methods, and Datasheets S1 and S2 in Supporting Information) were employed to cultivate microbes from 16 fecal samples. More than 7000 colonies were collected for enlarged cultivation and 16S ribosomal RNA (rRNA) genes were sequenced. As a result, 4100 pure bacterial isolates were obtained. The isolate IDs, their closest phylogenetic relatives, and 16S rRNA gene sequences are provided (Supporting Information Datasheet S3). According to 16S rRNA gene sequence identity (a cutoff value of 98.7%), 4100 isolates were further phylogenetically clustered into 97 different taxa (Supporting Information Datasheet S5). One strain was selected to represent each taxon for further studies. Thus, 97 representative strains were obtained and their genomes were sequenced (Supporting Information Datasheet S6). The quality and purity of 97 representative genomes were evaluated using CheckM (Supporting Information Materials). The 97 genomes were of good quality, as the average completeness of assemblies reached 97.06 ± 6.84% (median value was 99.18%), the average contamination was 0.96 ± 1.47% (median value was 0.43%), and the mean value of the estimated quality score (completeness −5 × contamination) was 92.25 ± 9.63% (median value was 95.98%). Of the 97 strains, we found that 32 strains did not phylogenetically match any previously known species and represented potentially novel bacterial taxa. We then characterized the 32 strains in terms of their cell morphology, DNA sequence-based phylogeny and phylogenomy, genomic analysis, and BIOLOG tests as described in Methods (see Supporting Information Materials). With this polyphasic taxonomy, results showed that all 32 strains were recognized as novel species, of which 18 belonged to previously described genera, 13 represented new genera, and 1 represented a new family. All new taxa were denominated following the rules of the International Code of Nomenclature of Prokaryotes (ICNP)23 and their protologues are provided in Table 1, and eight of the novel species have been described in detail24. Finally, the MfGMB comprising 250 strains of 97 species from 63 genera, 25 families, and 4 phyla were deposited at the China General Microbiological Culture Collection Center (CGMCC) for public accessibility. The taxonomic diversity of MfGMB is displayed in Figure 1A and the detailed information of all 250 representative strains (e.g., original strain IDs/names, genome features, accession numbers, etc.) are provided in Supporting Information Datasheet S4 and also with MfGMB homepage (https://nmdc.cn/mfgmb/and http://www.cgmcc.net/english/mfgmb/). The 97 representative genomes are publicly accessible via NCBI and NMDC (see Data Availability). We compared the compositions of MfGMB at species-level with gut microbial collections derived from human5, 6, 17-19, mouse20, 25, and pig21, and found that the MfGMB had unique species and expanded mammalian gut microbial biobanks. By a combined analysis of all the gut microbial culture collections from the same host, 851, 176, 110, and 97 gut microbial species were cultured from human, mouse, pig, and monkey, respectively. The distribution and overlap of species in host-specific collections are displayed in Figure 1B, and it is noted that 44 shared species (45.36%) and 53 (54.64%) unique species out of the 97 MfGMB species were included by MfGMB (Figure 1A). The 53 unique species of monkey gut belonged to 42 genera, and these 53 unique species expanded the mammalian gut microbial collections. We noted that 13 of the 53 unique microbial species represented core genera (for a definition of “core genera,” see the following paragraphs) as detected by 16S rRNA amplicon sequencing, which was characteristic and host-specific for monkey as dominant communities. To assess the microbial diversity of gut microbiota of M. fascicularis and evaluated the representativeness of the MfGMB to the fecal microbial diversity in this study, we sequenced the 16S rRNA gene amplicons of 161 fecal samples from M. fascicularis collected at different time points that were used for four batches of large-scale bacterial isolation works. The datasets were processed with a standard USEARCH-based analysis pipeline and were annotated using LTP_vbiobank customized by supplementation of LTP database with the 16S rRNA gene sequences of 32 novel taxa as described in the Methods (see Supporting Information Materials). First, the alpha diversity of gut microbiota in Batch 1 was significantly lower than that of the other three groups (Figure 1C), while as shown in Figure 1D, the beta diversity of Batch 1 was also distant from the other three groups. Moreover, the taxonomic divergence between Batch 1 and the other three groups was also observed by the taxonomic annotation-based analysis (Figure 1E–H). Specifically, at the phylum level (Figure 1E), 97.8 ± 1.3% of the total reads were assigned into 15 different phyla, while the relative abundance (RA) of Firmicutes and Bacteroidetes were the most dominant, and the two phyla accounted for 92.6% of the total reads. Notably, Bacteroidetes (RA = 76.0 ± 6.4%) were the most dominant phylum followed by Firmicutes (RA = 21.5 ± 6.0%) in Batch 1. Yet, in the other three batches, Firmicutes was the most abundant phylum (RA = 76.6 ± 12.0% for Batch 2, 70.6 ± 13.7% for Batch 3, and 60.6 ± 13.6% for Batch 4), and Bacteroidetes was the second dominant one with RA values ranging from 16.1% to 31.8%. When we examined the lower taxa of the datasets, a similar conclusion was drawn. The diversity of Batch 1 differed from the other three batches of samples at class, family, and genus levels (Figure 1F–H). The differences of taxonomic and compositional diversity were possibly ascribed to the change of diet from formula milk at 2-month age (Batch 1) to normal feed after 6-month old (Batches 2, 3, and 4), and the use of samples from hosts at different life stages might facilitate a better recovery of diverse gut microbes from experimental M. fascicularis. Subsequently, to further evaluate the taxonomic representativeness of MfGMB to the gut microbiota of M. fascicularis, we compared the taxa of MfGMB with the combined 16S rRNA amplicon datasets (samples from four different batches were merged together, n = 161) at the genus level. The results revealed that 60.6 ± 14.9% of the total reads were assigned into 155 genera, and the MfGMB covered 31 of them. If we defined the genera with average frequency of occurrence (FO) > 80% as “common genera,” those genera with average RA > 0.1% as “dominant genera,” and those genera shared by both “common” and “dominant” cohorts as “core genera,” then 47, 38, and 37 genera were recognized as common, dominant, and core genera, respectively (Figure 2A). The MfGMB covered 38.3%, 47.4%, and 48.6% of the common, dominant, and core genera, respectively. There were five newly described genera (Zhengyingia gen. nov., Huachunia gen. nov., Shaojiongia gen. nov., Qingshengia gen. nov., and Baoxiongia gen. nov.) belonging to the core genera. Thirty-two of the 97 species in MfGMB represented new taxa. To demonstrate the prevalence of these new taxa in NHPs and the human GMs, first, we analyzed 25 available M. fascicularis gut metagenome samples as described in Methods (see Supporting Information Materials). In total, 39.52 ± 17.73% reads were annotated into 8409 species. As shown in Figure 2B, all the 32 novel taxa were found in 25 samples (FO = 100%), and the RA ranged from 0.0025 ± 0.0016% (Shaojiongia intestinisimiae gen. nov., sp. nov.) to 0.15 ± 0.15% (Huanchunia intestinalis gen. nov., sp. nov.). Second, we analyzed the distribution of the 32 new taxa in human GMs by Kraken2-based annotation of 1129 metagenomes of healthy human fecal samples. It manifested in that the 32 new taxa widely existed also in human guts (Figure 2C). The FO for all the new taxa and the mean value of FOs reached >80% and up to 96.97%, respectively, which indicated that the newly characterized taxa were prevalent in both monkey and human guts. Noticeably, of the 32 new taxa, the top three abundant species (Blautia ovalis sp. nov., Blautia simiae sp. nov., and Blautia beijingensis sp. nov.) were all from the same genus Blautia, while Huanchunia intestinalis gen. nov., sp. nov., as the most abundant new MfGMB taxa in monkey GMs (Figure 2B), ranked the fourth richest in humans (Figure 2C). Moreover, we also compared the genomes of the 32 new taxa with over 1000 metagenome-assembled genomes (MAGs) representing previously uncharacterized species of NHP GM constructed recently using six publicly available NHP metagenomic cohorts26. It revealed that 15 of our new taxa got hit on MAGs, while four of them represented previously uncultured “dark” taxa without any reference genome ever achieved before this study. To further reveal the functional potentials of genomes in MfGMB, we created a gene catalog containing 313,603 nonredundant genes with 97 MfGMB genomes (named MfGMB.catalog) and compared it by BLAST analysis against the M. fascicularis global gut microbial gene catalog containing 1,991,169 nonredundant genes constructed by Li et al.26 (named global catalog). The results showed that the MfGMB catalog covered 463,647 of the proteins in the global catalog at 40% identity and 70% coverage. It drastically enriched the existing global catalog by 123,771 new genes, as only 189,832 genes were shared by both catalogs. We then investigated the representativeness of MfGMB genomes to the annotated functions of M. fascicularis GM. For this purpose, the 97 MfGMB genomes and the global gene catalog were annotated with eggNOG 5.027. Of the 97 genomes, 180,019 genes were annotated into 6803 different KEGG Orthologs (KOs), while 955,272 genes of the global catalog were annotated into 7075 different KOs. A cumulative analysis of the KO profiles was conducted to determine the coverages of the global catalog by a random incremental selection of the 97 genomes. As shown in rarefaction curves, the MfGMB genomes covered 5733 of the KO genes from global catalogs (blue lines in Figure 2D) accounting for 81.0% of the known function of M. fascicularis GM represented by the global gene catalog. If we quantify the representativeness of MfGMB genomes to the GM at the sequence level rather than the functional level, the 97 MfGMB genomes represented 23.3% of the gene sequences at 40% amino acid identity and 70% sequence coverage. Besides the good recovery of functionally known genes, each MfGMB genome harbored an average of 40.11 ± 9.28% genes of unknown functions (Supporting Information Datasheets S7 and S8), indicating the potential roles of MfGMB as a cultivable gene pool for the culture-dependent study of “dark” functions in M. fascicularis GMs. In summary, we constructed a microbial biobank, the MfGMB, for monkey GMs. The information on 250 bacterial strains of 97 species as well as their genome data is publicly available at the MfGMB homepage (https://nmdc.cn/mfgmb/and http://www.cgmcc.net/english/mfgmb/). The MfGMB covered nearly 50% core genera of GM samples (n = 161) from monkeys aged 0–5 years. In addition to the characteristic bacterial taxa that are represented by the 13 core genera, including five novel genera (Zhengyingia gen. nov., Huachunia gen. nov., Shaojiongia gen. nov., Qingshengia gen. nov., and Baoxiongia gen. nov.) proposed in this study, MfGMB shared 37, 47, and 34 bacterial taxa at the species level with gut microbial biobanks of human5, 6, 17-19, mouse20, 25, and pig21, respectively. The MfGMB and other mammalian gut microbial biobanks5, 6, 17-20, 25 provide diverse microbial resources and support causative and insightful studies on microbe–microbe and microbe–host interactions at the species level. The MfGMB reported previously unknown higher taxon, that is, Zhongyuiaceae fam. nov. within the order Clostridiales. Zhongyuia ovalis was the first isolates of the proposed family and its genome size was only 1.88 Mb. Smaller genomes are often associated with host parasitism28, 29. This genome of Z. ovalis encoded 1609 putative genes and 30.52% were functionally unknown genes. We observed that Z. ovalis assimilated N-acetyl-glucosamine, a component of mammalian soft bone and of some bacterial cell walls. Z. ovalis widely occurred in human GM (FO = 89.46%, RA = 0.0133%), yet, its functionality per se and its interactions with hosts remains to be investigated. We found that Dysosmobacter was a core genus (FO > 80% and RA > 0.1%) in M. fascicularis GMs. Dysosmobacter welbionis was first isolated from human faeces30 and so far the only species of genus Dysosmobacter. A recent study proved D. welbionis to be beneficial in the prevention of diet-induced obesity31. The MfGMB reported two new species, Dysosmobacter brevis and Dysosmobacter acutus. The analysis of D. brevis genomes revealed the presence of butyrate synthesis pathways that occur in other bacteria32-34. Thus, D. brevis is a potential butyrate producer. Considering that butyrate serves as the main energy source for colonocytes35, butyrate-producing bacteria play a key role in colonic health, including epithelial integrity; the new Dysosmobacter resources of MfGMB would support further causative studies and potential applications in probiotics. This study was financially supported by the National Key Research and Development Program of China (No.2019YFA0905601), the Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDB38020300), and China Microbiome Initiative (CMI) supported by Chinese Academy of Sciences (CAS-CMI). Conceived and designed the experiments: Shuang-Jiang Liu. Performed the experiments: Danhua Li, Rexiding Abuduaini, Mengxuan Du, Yujing Wang. Analyzed the data: Chang Liu, Haizheng Zhu, Honghe Chen, Nan Zhou. Coordinated sample collections: Yong Lu, Qiang Sun. Conducted the microbial strain preservation: Yuhua Xin, Yuguang Zhou. Constructed the webpage and uploaded all the data: Linhuan Wu, Juncai Ma. Drafted the manuscript: Danhua Li, Chang Liu. Approved final version of manuscript: Chengying Jiang, Shuang-Jiang Liu. All authors read and approved the final manuscript. The authors declare no conflict of interests. The ethics application (ION-2019043) was approved by the Institute of Neuroscience, Chinese Academy of Sciences. The metadata generated and analyzed in this study are available as the following: all the polyphasic taxonomic information of 97 MfGMB species is available at MfGMB homepage (https://nmdc.cn/mfgmb/). All the 250 strains and their 16S rRNA gene sequences were accessible via MfGMB special page on CGMCC official website (https://www.cgmcc.net/english/mfgmb/). All the 16S rRNA gene sequences and genomes generated in this study are accessible via GenBank using the accession numbers shown in Supplementary Datasets. The 16S rRNA gene amplicon data, metagenomic data, all the 16S rRNA gene sequences, and draft/complete genomes are accessible in NCBI via project PRJNA733006 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA733006) and NMDC via project NMDC10017790 (https://nmdc.cn/resource/genomics/project/detail/NMDC10017790). Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.