序列空间
序列(生物学)
蛋白质工程
蛋白质测序
蛋白质设计
计算机科学
计算生物学
编码
生物信息学
生物
蛋白质结构
肽序列
遗传学
数学
基因
生物化学
巴拿赫空间
酶
纯数学
作者
Ben E. Clifton,Dan Kozome,Paola Laurino
出处
期刊:Biochemistry
[American Chemical Society]
日期:2022-03-04
卷期号:62 (2): 210-220
被引量:11
标识
DOI:10.1021/acs.biochem.1c00757
摘要
The rapid growth of sequence databases over the past two decades means that protein engineers faced with optimizing a protein for any given task will often have immediate access to a vast number of related protein sequences. These sequences encode information about the evolutionary history of the protein and the underlying sequence requirements to produce folded, stable, and functional protein variants. Methods that can take advantage of this information are an increasingly important part of the protein engineering tool kit. In this Perspective, we discuss the utility of sequence data in protein engineering and design, focusing on recent advances in three main areas: the use of ancestral sequence reconstruction as an engineering tool to generate thermostable and multifunctional proteins, the use of sequence data to guide engineering of multipoint mutants by structure-based computational protein design, and the use of unlabeled sequence data for unsupervised and semisupervised machine learning, allowing the generation of diverse and functional protein sequences in unexplored regions of sequence space. Altogether, these methods enable the rapid exploration of sequence space within regions enriched with functional proteins and therefore have great potential for accelerating the engineering of stable, functional, and diverse proteins for industrial and biomedical applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI