注释
蛋白质功能预测
计算机科学
基因组
计算生物学
蛋白质功能
同源(生物学)
蛋白质测序
稳健性(进化)
人工智能
机器学习
生物
肽序列
遗传学
基因
作者
Jun Wu,Qing He,Jian Ouyang,Jiajia Zhang,Zihao Gao,Christopher E. Mason,Zhichao Liu,Tieliu Shi
摘要
Protein function prediction based on amino acid sequence alone is an extremely challenging but important task, especially in metagenomics/metatranscriptomics field, in which novel proteins have been uncovered exponentially from new microorganisms. Many of them are extremely low homology to known proteins and cannot be annotated with homology-based or information integrative methods. To overcome this problem, we proposed a Homology Independent protein Function annotation method (HiFun) based on a unified deep-learning model by reassembling the sequence as protein language. The robustness of HiFun was evaluated using the benchmark datasets and metrics in the CAFA3 challenge. To navigate the utility of HiFun, we annotated 2 212 663 unknown proteins and discovered novel motifs in the UHGP-50 catalog. We proved that HiFun can extract latent function related structure features which empowers it ability to achieve function annotation for non-homology proteins. HiFun can substantially improve newly proteins annotation and expand our understanding of microorganisms' adaptation in various ecological niches. Moreover, we provided a free and accessible webservice at http://www.unimd.org/HiFun, requiring only protein sequences as input, offering researchers an efficient and practical platform for predicting protein functions.
科研通智能强力驱动
Strongly Powered by AbleSci AI