基因组
基因组
可扩展性
计算机科学
树(集合论)
质量(理念)
计算生物学
人工智能
机器学习
数据挖掘
生物
基因
遗传学
数学
数据库
数学分析
哲学
认识论
作者
Alex Chklovski,Donovan H. Parks,Ben J. Woodcroft,Gene W. Tyson
标识
DOI:10.1101/2022.07.11.499243
摘要
Advances in DNA sequencing and bioinformatics have dramatically increased the rate of recovery of microbial genomes from metagenomic data. Assessing the quality of metagenome-assembled genomes (MAGs) is a critical step prior to downstream analysis. Here, we present CheckM2, an improved method of predicting the completeness and contamination of MAGs using machine learning. We demonstrate the effectiveness of CheckM2 on synthetic and experimental data, and show that it outperforms the original version of CheckM in predicting MAG quality. CheckM2 is substantially faster than CheckM and its database can be rapidly updated with new high-quality reference genomes. We show that CheckM2 accurately predicts genome quality for MAGs from novel lineages, even those with sparse genomic representation, or reduced genome size (e.g. symbionts) such as those found in the Patescibacteria and the DPANN superphylum. CheckM2 provides accurate genome quality predictions across the microbial tree of life, giving increased confidence when inferring novel biological conclusions from MAGs.
科研通智能强力驱动
Strongly Powered by AbleSci AI