Machine-Learning-Guided Library Design Cycle for Directed Evolution of Enzymes: The Effects of Training Data Composition on Sequence Space Exploration

定向进化 序列空间 序列(生物学) 定向分子进化 蛋白质工程 作文(语言) 系列(地层学) 蛋白质测序 化学空间 功能(生物学) 计算机科学 计算生物学 生物 人工智能 生物信息学 遗传学 肽序列 生物化学 数学 基因 药物发现 语言学 突变体 古生物学 哲学 巴拿赫空间 纯数学
作者
Yutaka Saitô,Misaki Oikawa,T. Sato,Hikaru Nakazawa,Tsuyoshi Ito,Tomoshi Kameda,Koji Tsuda,Mitsuo Umetsu
出处
期刊:ACS Catalysis 卷期号:11 (23): 14615-14624 被引量:17
标识
DOI:10.1021/acscatal.1c03753
摘要

Machine learning (ML) is becoming an attractive tool in mutagenesis-based protein engineering because of its ability to design a variant library containing proteins with a desired function. However, it remains unclear how ML guides directed evolution in sequence space depending on the composition of training data. Here, we present a ML-guided directed evolution study of an enzyme to investigate the effects of a known “highly positive” variant (i.e., variant known to have high enzyme activity) in training data. We performed two separate series of ML-guided directed evolution of Sortase A with and without a known highly positive variant called 5M in training data. In each series, two rounds of ML were conducted: variants predicted by the initial round were experimentally evaluated and used as additional training data for the second-round of prediction. The improvements in enzyme activity were comparable between the two series, both achieving enzyme activity 2.2–2.5 times higher than 5M. Intriguingly, the sequences of the improved variants were largely different between the two series, indicating that ML guided the directed evolution to the distinct regions of sequence space depending on the presence/absence of the highly positive variant in the training data. This suggests that the sequence diversity of improved variants can be expanded not only by conventional ML using the whole training data but also by ML using a subset of the training data even when it lacks highly positive variants. In summary, this study demonstrates the importance of regulating the composition of training data in ML-guided directed evolution.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
光亮又晴完成签到 ,获得积分10
1秒前
3秒前
福艺珍的小陀螺完成签到,获得积分10
3秒前
5秒前
安容天完成签到,获得积分10
7秒前
7秒前
霜降完成签到 ,获得积分20
9秒前
11秒前
鲸落发布了新的文献求助30
13秒前
14秒前
15秒前
JamesPei应助楚洮采纳,获得10
16秒前
18秒前
共享精神应助机智惜儿采纳,获得10
20秒前
FashionBoy应助Mr_X采纳,获得10
22秒前
莲子清凉下火完成签到,获得积分10
23秒前
Huginn发布了新的文献求助10
23秒前
24秒前
无花果应助儒雅南风采纳,获得10
28秒前
阿司匹林发布了新的文献求助30
28秒前
风中听安发布了新的文献求助30
29秒前
33秒前
ocean完成签到,获得积分10
33秒前
一只笨笨鱼完成签到,获得积分10
33秒前
Huginn完成签到,获得积分10
34秒前
CipherSage应助呆萌的不凡采纳,获得10
35秒前
七七完成签到,获得积分10
35秒前
Jasper应助苦咖啡采纳,获得10
35秒前
楚洮发布了新的文献求助10
37秒前
毛豆爸爸应助Argen采纳,获得10
37秒前
情红锐完成签到,获得积分10
38秒前
39秒前
39秒前
39秒前
czlianjoy完成签到,获得积分10
40秒前
MMM完成签到,获得积分10
40秒前
40秒前
42秒前
爱因斯坦刘刘完成签到,获得积分10
42秒前
KK发布了新的文献求助10
42秒前
高分求助中
Production Logging: Theoretical and Interpretive Elements 2000
Very-high-order BVD Schemes Using β-variable THINC Method 1200
BIOLOGY OF NON-CHORDATES 1000
进口的时尚——14世纪东方丝绸与意大利艺术 Imported Fashion:Oriental Silks and Italian Arts in the 14th Century 800
Autoregulatory progressive resistance exercise: linear versus a velocity-based flexible model 550
Education and Upward Social Mobility in China: Imagining Positive Sociology with Bourdieu 500
Zeitschrift für Orient-Archäologie 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 细胞生物学 免疫学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3352928
求助须知:如何正确求助?哪些是违规求助? 2977777
关于积分的说明 8681926
捐赠科研通 2658892
什么是DOI,文献DOI怎么找? 1455972
科研通“疑难数据库(出版商)”最低求助积分说明 674206
邀请新用户注册赠送积分活动 664884