计算机科学
人工智能
水准点(测量)
任务(项目管理)
集合(抽象数据类型)
机器学习
注释
深度学习
源代码
鉴定(生物学)
功能(生物学)
自然语言处理
生物
程序设计语言
遗传学
植物
管理
大地测量学
经济
地理
作者
Jia‐shun Wu,Yan Liu,Ying Zhang,Xiaoyu Wang,Yan He,Yiheng Zhu,Jiangning Song,Dong‐Jun Yu
标识
DOI:10.1021/acs.jcim.4c02092
摘要
The accurate identification of protein-nucleotide binding residues is crucial for protein function annotation and drug discovery. Numerous computational methods have been proposed to predict these binding residues, achieving remarkable performance. However, due to the limited availability and high variability of nucleotides, predicting binding residues for diverse nucleotides remains a significant challenge. To address these, we propose NucGMTL, a new grouped deep multi-task learning approach designed for predicting binding residues of all observed nucleotides in the BioLiP database. NucGMTL leverages pre-trained protein language models to generate robust sequence embedding and incorporates multi-scale learning along with scale-based self-attention mechanisms to capture a broader range of feature dependencies. To effectively harness the shared binding patterns across various nucleotides, deep multi-task learning is utilized to distill common representations, taking advantage of auxiliary information from similar nucleotides selected based on task grouping. Performance evaluation on benchmark data sets shows that NucGMTL achieves an average area under the Precision-Recall curve (AUPRC) of 0.594, surpassing other state-of-the-art methods. Further analyses highlight that the predominant advantage of NucGMTL can be reflected by its effective integration of grouped multi-task learning and pre-trained protein language models. The data set and source code are freely accessible at: https://github.com/jerry1984Y/NucGMTL.
科研通智能强力驱动
Strongly Powered by AbleSci AI