概化理论
人工智能
理论(学习稳定性)
水准点(测量)
计算机科学
机器学习
训练集
适应度函数
深度学习
相关性
试验数据
数学
遗传算法
统计
几何学
大地测量学
地理
程序设计语言
作者
Yunxin Xu,Ди Лю,H. Gong
标识
DOI:10.1101/2023.05.28.542668
摘要
A bstract Accurate prediction of the fitness and stability of a protein upon mutations is of high importance in protein engineering and design. Despite the rapid development of deep learning techniques and accumulation of experimental data, the multi-labeled nature of fitness data hinders the training of robust deep-learning-based models for the fitness and stability prediction tasks. Here, we propose three geometric-learning-based models, GeoFitness, GeoDDG and GeoDTm, for the prediction of the fitness score, ΔΔ G and Δ T m of a protein upon mutations, respectively. In the optimization of GeoFitness, we designed a novel loss function to allow supervised training of a unified model using the large amount of multi-labeled fitness data in the deep mutational scanning (DMS) database. By this means, GeoFitness efficiently learns the general functional effects of protein mutations and achieves better performance over the other state-of-the-art methods. To further improve the downstream tasks of ΔΔ G /Δ T m prediction, we re-utilized the encoder of GeoFitness as a pre-trained module in GeoDDG and GeoDTm to overcome the challenge of lack of sufficient amount of specifically labeled data. This pre-training strategy in combination with data expansion remarkably improves model performance and generalizability. When evaluated on the benchmark test sets (S669 for ΔΔ G prediction and a newly collected set S571 for Δ T m prediction), GeoDDG and GeoDTm outperform the other state-of-the-art methods by at least 30% and 70%, respectively, in terms of the Spearman correlation coefficient between predicted and experimental values. An online server for the suite of these three predictors, GeoStab-suite, is available at http://structpred.life.tsinghua.edu.cn/server_geostab.html .
科研通智能强力驱动
Strongly Powered by AbleSci AI