计算生物学
微生物群
编码
基因组
古细菌
细菌蛋白
生物
细菌基因组大小
细菌
肠道微生物群
计算机科学
生物信息学
基因
遗传学
作者
Yiqian Duan,Célio Dias Santos Júnior,Thomas Schmidt,Anthony Fullam,Breno L. S. de Almeida,Chengkai Zhu,Michael Kuhn,Xing‐Ming Zhao,Peer Bork,Luís Pedro Coelho
标识
DOI:10.1038/s41467-024-51894-6
摘要
Abstract Small open reading frames (smORFs) shorter than 100 codons are widespread and perform essential roles in microorganisms, where they encode proteins active in several cell functions, including signal pathways, stress response, and antibacterial activities. However, the ecology, distribution and role of small proteins in the global microbiome remain unknown. Here, we construct a global microbial smORFs catalog (GMSC) derived from 63,410 publicly available metagenomes across 75 distinct habitats and 87,920 high-quality isolate genomes. GMSC contains 965 million non-redundant smORFs with comprehensive annotations. We find that archaea harbor more smORFs proportionally than bacteria. We moreover provide a tool called GMSC-mapper to identify and annotate small proteins from microbial (meta)genomes. Overall, this publicly-available resource demonstrates the immense and underexplored diversity of small proteins.
科研通智能强力驱动
Strongly Powered by AbleSci AI