Machine-Learning-Guided Library Design Cycle for Directed Evolution of Enzymes: The Effects of Training Data Composition on Sequence Space Exploration

定向进化 序列空间 序列(生物学) 定向分子进化 蛋白质工程 作文(语言) 系列(地层学) 蛋白质测序 化学空间 功能(生物学) 计算机科学 计算生物学 生物 人工智能 生物信息学 遗传学 肽序列 生物化学 数学 基因 药物发现 语言学 突变体 古生物学 哲学 巴拿赫空间 纯数学
作者
Yutaka Saitô,Misaki Oikawa,T. Sato,Hikaru Nakazawa,Tsuyoshi Ito,Tomoshi Kameda,Koji Tsuda,Mitsuo Umetsu
出处
期刊:ACS Catalysis [American Chemical Society]
卷期号:11 (23): 14615-14624 被引量:17
标识
DOI:10.1021/acscatal.1c03753
摘要

Machine learning (ML) is becoming an attractive tool in mutagenesis-based protein engineering because of its ability to design a variant library containing proteins with a desired function. However, it remains unclear how ML guides directed evolution in sequence space depending on the composition of training data. Here, we present a ML-guided directed evolution study of an enzyme to investigate the effects of a known “highly positive” variant (i.e., variant known to have high enzyme activity) in training data. We performed two separate series of ML-guided directed evolution of Sortase A with and without a known highly positive variant called 5M in training data. In each series, two rounds of ML were conducted: variants predicted by the initial round were experimentally evaluated and used as additional training data for the second-round of prediction. The improvements in enzyme activity were comparable between the two series, both achieving enzyme activity 2.2–2.5 times higher than 5M. Intriguingly, the sequences of the improved variants were largely different between the two series, indicating that ML guided the directed evolution to the distinct regions of sequence space depending on the presence/absence of the highly positive variant in the training data. This suggests that the sequence diversity of improved variants can be expanded not only by conventional ML using the whole training data but also by ML using a subset of the training data even when it lacks highly positive variants. In summary, this study demonstrates the importance of regulating the composition of training data in ML-guided directed evolution.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
量子星尘发布了新的文献求助10
7秒前
李爱国应助大橙子采纳,获得10
8秒前
magictoo发布了新的文献求助30
14秒前
16秒前
yang完成签到,获得积分10
16秒前
Minicoper发布了新的文献求助10
17秒前
快乐丸子完成签到,获得积分10
18秒前
简单而复杂完成签到,获得积分10
18秒前
大橙子发布了新的文献求助10
22秒前
张牧之完成签到 ,获得积分10
24秒前
冷冷暴力完成签到,获得积分10
26秒前
YYY完成签到,获得积分10
26秒前
26秒前
gujian完成签到 ,获得积分10
29秒前
帅气的秘密完成签到 ,获得积分10
30秒前
自然函发布了新的文献求助10
34秒前
冰冰双双完成签到,获得积分10
34秒前
开心夏旋完成签到 ,获得积分0
36秒前
我要读博士完成签到 ,获得积分10
39秒前
活泼的大船完成签到,获得积分10
39秒前
AFF完成签到,获得积分10
40秒前
41秒前
无私小小完成签到,获得积分10
42秒前
随心所欲完成签到 ,获得积分10
43秒前
润润轩轩完成签到 ,获得积分10
44秒前
CodeCraft应助大橙子采纳,获得10
44秒前
ZR完成签到,获得积分10
45秒前
magictoo完成签到,获得积分10
45秒前
陈昊发布了新的文献求助10
46秒前
zhangliangfu完成签到 ,获得积分10
46秒前
金石为开完成签到,获得积分10
46秒前
王QQ完成签到 ,获得积分10
48秒前
唐唐完成签到 ,获得积分10
52秒前
最棒哒完成签到 ,获得积分10
52秒前
鸣鸣完成签到,获得积分10
53秒前
123321完成签到 ,获得积分10
54秒前
卓若之完成签到 ,获得积分10
55秒前
苯二氮卓完成签到,获得积分10
56秒前
温暖完成签到 ,获得积分10
58秒前
mojomars完成签到,获得积分10
58秒前
高分求助中
【提示信息,请勿应助】关于scihub 10000
Les Mantodea de Guyane: Insecta, Polyneoptera [The Mantids of French Guiana] 3000
徐淮辽南地区新元古代叠层石及生物地层 3000
The Mother of All Tableaux: Order, Equivalence, and Geometry in the Large-scale Structure of Optimality Theory 3000
Handbook of Industrial Diamonds.Vol2 1100
Global Eyelash Assessment scale (GEA) 1000
Picture Books with Same-sex Parented Families: Unintentional Censorship 550
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 4038128
求助须知:如何正确求助?哪些是违规求助? 3575831
关于积分的说明 11373827
捐赠科研通 3305610
什么是DOI,文献DOI怎么找? 1819255
邀请新用户注册赠送积分活动 892655
科研通“疑难数据库(出版商)”最低求助积分说明 815022