序列(生物学)
序列空间
扩散
空格(标点符号)
计算生物学
蛋白质设计
计算机科学
生物
遗传学
蛋白质结构
数学
物理
生物化学
离散数学
热力学
操作系统
巴拿赫空间
作者
Sidney Lyayuga Lisanza,Jacob Merle Gershon,S. Tipps,Jeremiah Nelson Sims,Lucas Arnoldt,Samuel J. Hendel,Miriam K. Simma,Ge Liu,Muna Yase,Hongwei Wu,Claire D. Tharp,Xinting Li,Alex Kang,Evans Brackenbrough,Asim K. Bera,Stacey Gerben,Bruce J. Wittmann,Andrew C. McShan,David Baker
标识
DOI:10.1038/s41587-024-02395-w
摘要
Protein denoising diffusion probabilistic models are used for the de novo generation of protein backbones but are limited in their ability to guide generation of proteins with sequence-specific attributes and functional properties. To overcome this limitation, we developed ProteinGenerator (PG), a sequence space diffusion model based on RoseTTAFold that simultaneously generates protein sequences and structures. Beginning from a noised sequence representation, PG generates sequence and structure pairs by iterative denoising, guided by desired sequence and structural protein attributes. We designed thermostable proteins with varying amino acid compositions and internal sequence repeats and cage bioactive peptides, such as melittin. By averaging sequence logits between diffusion trajectories with distinct structural constraints, we designed multistate parent-child protein triples in which the same sequence folds to different supersecondary structures when intact in the parent versus split into two child domains. PG design trajectories can be guided by experimental sequence-activity data, providing a general approach for integrated computational and experimental optimization of protein function.
科研通智能强力驱动
Strongly Powered by AbleSci AI