计算机科学
变压器
人工智能
蛋白质结构预测
序列(生物学)
嵌入
水准点(测量)
算法
机器学习
模式识别(心理学)
蛋白质结构
生物
工程类
生物化学
遗传学
大地测量学
电压
地理
电气工程
作者
Wenkai Wang,Zhenling Peng,Jianyi Yang
标识
DOI:10.1101/2022.01.15.476476
摘要
Abstract It remains challenging for single-sequence protein structure prediction with AlphaFold2 and other deep learning methods. In this work, we introduce trRosettaX-Single, a novel algorithm for singlesequence protein structure prediction. It is built on sequence embedding from s-ESM-1b, a supervised transformer protein language model optimized from the pre-trained model ESM-1b. The sequence embedding is fed into a multi-scale network with knowledge distillation to predict inter-residue 2D geometry, including distance and orientations. The predicted 2D geometry is then used to reconstruct 3D structure models based on energy minimization. Benchmark tests show that trRosettaX-Single outperforms AlphaFold2 and RoseTTAFold on natural proteins. For instance, with single-sequence input, trRosettaX-Single generates structure models with an average TM-score ~0.5 on 77 CASP14 domains, significantly higher than AlphaFold2 (0.35) and RoseTTAFold (0.34). Further test on 101 human-designed proteins indicates that trRosettaX-Single works very well, with accuracy (average TM-score 0.77) approaching AlphaFold2 and higher than RoseTTAFold, but using much less computing resource. On 2000 designed proteins from network hallucination, trRosettaX-Single generates structure models highly consistent to the hallucinated ones. These data suggest that trRosettaX-Single may find immediate applications in de novo protein design and related studies. trRosettaX-Single is available through the trRosetta server at: http://yanglab.nankai.edu.cn/trRosetta/ .
科研通智能强力驱动
Strongly Powered by AbleSci AI