计算机科学
融合
功能(生物学)
人工智能
自然语言处理
机器学习
语言学
生物
哲学
进化生物学
作者
Yiheng Zhu,Shenglong Zhu,Xuan Yu,Yan He,Yan Liu,Xiaojun Xie,Dong‐Jun Yu,Rui Ye
标识
DOI:10.1101/2025.03.27.645685
摘要
Accurately identifying protein functions is essential to understand life mechanisms and thus advance drug discovery. Although biochemical experiments are the gold standard for determining protein functions, they are often time-consuming and labor-intensive. Here, we proposed a novel composite deep-learning method, MKFGO, to infer Gene Ontology (GO) attributes through integrating five complementary pipelines built on multi-source biological data. MKFGO was rigorously benchmarked on 1522 non-redundant proteins, demonstrating superior performance over 11 state-of-the-art function prediction methods. Comprehensive data analyses revealed that the major advantage of MKFGO lies in its two deep-learning components, HFRGO and PLMGO, which derive handcraft features and protein large language model (PLM)-based features, respectively, from protein sequences in different biological views, with effective knowledge fusion at the decision-level. HFRGO leverages an LSTM-attention network embedded with handcraft features, in which the triplet loss-based guilt-by-association strategy is designed to enhance the correlation between feature similarity and function similarity. PLMGO employs the PLM to capture feature embeddings with discriminative functional patterns from sequences. Meanwhile, another three components provide complementary insights for further improving prediction accuracy, driven by protein-protein interaction, GO term probability, and protein-coding gene sequence, respectively. The source codes and models of MKFGO are freely available at https://github.com/yiheng-zhu/MKFGO.
科研通智能强力驱动
Strongly Powered by AbleSci AI