小桶
注释
聚类分析
人工智能
机器学习
计算机科学
基因注释
人类微生物组计划
生物
基因
基因组
计算生物学
基因组
遗传学
基因表达
转录组
作者
Michael Robben,Mohammad Sadegh Nasr,Avishek Das,Jai Prakash Veerla,Manfred Huber,Justyn Jaworski,Jon A. Weidanz,Jacob M. Luber
标识
DOI:10.1089/cmb.2022.0370
摘要
The development of tools for the annotation of genes from newly sequenced species has not evolved much from homologous alignment to prior annotated species. While the quality of gene annotations continues to decline as we sequence and assemble more evolutionary distant gut microbiome species, machine learning presents a high quality alternative to traditional techniques. In this study, we investigate the relative performance of common classical and nonclassical machine learning algorithms in the problem of gene annotation using human microbiome-associated species genes from the KEGG database. The majority of the ensemble, clustering, and deep learning algorithms that we investigated showed higher prediction accuracy than CD-Hit in predicting partial KEGG function. Motif-based, machine-learning methods of annotation in new species were faster and had higher precision–recall than methods of homologous alignment or orthologous gene clustering. Gradient boosted ensemble methods and neural networks also predicted higher connectivity in reconstructed KEGG pathways, finding twice as many new pathway interactions than blast alignment. The use of motif-based, machine-learning algorithms in annotation software will allow researchers to develop powerful tools to interact with bacterial microbiomes in ways previously unachievable through homologous sequence alignment alone.
科研通智能强力驱动
Strongly Powered by AbleSci AI