计算机科学
嵌入
比例(比率)
功能(生物学)
组分(热力学)
语言模型
蛋白质功能
人工智能
数据挖掘
自然语言处理
机器学习
生物
生物化学
物理
量子力学
进化生物学
基因
热力学
作者
Shaojun Wang,Ronghui You,Yunjia Liu,Yi Xiong,Shanfeng Zhu
标识
DOI:10.1016/j.gpb.2023.04.001
摘要
As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, protein language models have been proposed to learn informative representations [e.g., Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at https://dmiip.sjtu.edu.cn/ng3.0.
科研通智能强力驱动
Strongly Powered by AbleSci AI