生成语法
酶
小分子
化学
计算生物学
生成设计
生物化学
计算机科学
生物
材料科学
人工智能
相容性(地球化学)
复合材料
作者
Zhenqiao Song,Yunlong Zhao,Wenxian Shi,Wengong Jin,Yang Yang,Lei Li
出处
期刊:Cornell University - arXiv
日期:2024-05-13
标识
DOI:10.48550/arxiv.2405.08205
摘要
Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme's amino acid sequence and their three-dimensional (3D) coordinates based on functionally important sites and substrates corresponding to a desired catalytic function. These sites are automatically mined from enzyme databases. EnzyGen consists of a novel interleaving network of attention and neighborhood equivariant layers, which captures both long-range correlation in an entire protein sequence and local influence from nearest amino acids in 3D space. To learn the generative model, we devise a joint training objective, including a sequence generation loss, a position prediction loss and an enzyme-substrate interaction loss. We further construct EnzyBench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). Experimental results show that our EnzyGen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity. These findings demonstrate EnzyGen's superior capability in designing well-folded and effective enzymes binding to specific substrates with high affinities.
科研通智能强力驱动
Strongly Powered by AbleSci AI