再培训
计算机科学
一般化
人工智能
机器学习
集合(抽象数据类型)
匹配(统计)
过程(计算)
编码(集合论)
数学
统计
操作系统
数学分析
业务
国际贸易
程序设计语言
作者
Gustaf Ahdritz,Nazim Bouatta,Christina Floristean,Sachin Kadyan,Qinghui Xia,William Gerecke,Timothy O’Donnell,Daniel Berenberg,I. Fisk,Niccolò Zanichelli,Bo Zhang,Arkadiusz Nowaczynski,Bei Wang,Marta M. Stepniewska-Dziubinska,Shang Zhang,Adegoke A. Ojewole,Murat Efe Guney,Stella Biderman,Andrew M. Watkins,Stephen Ra
标识
DOI:10.1101/2022.11.20.517210
摘要
Abstract AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (i) tackle new tasks, like protein-ligand complex structure prediction, (ii) investigate the process by which the model learns, which remains poorly understood, and (iii) assess the model’s generalization capacity to unseen regions of fold space. Here we report OpenFold, a fast, memory-efficient, and trainable implementation of AlphaFold2. We train OpenFold from scratch, fully matching the accuracy of AlphaFold2. Having established parity, we assess OpenFold’s capacity to generalize across fold space by retraining it using carefully designed datasets. We find that OpenFold is remarkably robust at generalizing despite extreme reductions in training set size and diversity, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced by OpenFold during training, we also gain surprising insights into the manner in which the model learns to fold proteins, discovering that spatial dimensions are learned sequentially. Taken together, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial new resource for the protein modeling community.
科研通智能强力驱动
Strongly Powered by AbleSci AI