困惑
计算机科学
序列(生物学)
蛋白质设计
反向
参数化复杂度
人工智能
卡斯普
蛋白质折叠
折叠(DSP实现)
蛋白质结构预测
代表(政治)
蛋白质测序
蛋白质结构
语言模型
模式识别(心理学)
机器学习
算法
肽序列
数学
生物
生物化学
几何学
电气工程
基因
工程类
政治
法学
政治学
作者
Kevin Yang,Niccolò Zanichelli,Hugh Yeh
出处
期刊:Protein Engineering Design & Selection
[Oxford University Press]
日期:2022-10-26
卷期号:36
被引量:24
标识
DOI:10.1093/protein/gzad015
摘要
Self-supervised pretraining on protein sequences has led to state-of-the art performance on protein function and fitness prediction. However, sequence-only methods ignore the rich information contained in experimental and predicted protein structures. Meanwhile, inverse folding methods reconstruct a protein's amino-acid sequence given its structure, but do not take advantage of sequences that do not have known structures. In this study, we train a masked inverse folding protein masked language model parameterized as a structured graph neural network. During pretraining, this model learns to reconstruct corrupted sequences conditioned on the backbone structure. We then show that using the outputs from a pretrained sequence-only protein masked language model as input to the inverse folding model further improves pretraining perplexity. We evaluate both of these models on downstream protein engineering tasks and analyze the effect of using information from experimental or predicted structures on performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI