定向进化
序列空间
序列(生物学)
定向分子进化
蛋白质工程
作文(语言)
系列(地层学)
蛋白质测序
化学空间
功能(生物学)
计算机科学
计算生物学
酶
生物
人工智能
生物信息学
遗传学
肽序列
生物化学
数学
基因
药物发现
语言学
突变体
古生物学
哲学
巴拿赫空间
纯数学
作者
Yutaka Saitô,Misaki Oikawa,T. Sato,Hikaru Nakazawa,Tsuyoshi Ito,Tomoshi Kameda,Koji Tsuda,Mitsuo Umetsu
出处
期刊:ACS Catalysis
日期:2021-11-19
卷期号:11 (23): 14615-14624
被引量:17
标识
DOI:10.1021/acscatal.1c03753
摘要
Machine learning (ML) is becoming an attractive tool in mutagenesis-based protein engineering because of its ability to design a variant library containing proteins with a desired function. However, it remains unclear how ML guides directed evolution in sequence space depending on the composition of training data. Here, we present a ML-guided directed evolution study of an enzyme to investigate the effects of a known “highly positive” variant (i.e., variant known to have high enzyme activity) in training data. We performed two separate series of ML-guided directed evolution of Sortase A with and without a known highly positive variant called 5M in training data. In each series, two rounds of ML were conducted: variants predicted by the initial round were experimentally evaluated and used as additional training data for the second-round of prediction. The improvements in enzyme activity were comparable between the two series, both achieving enzyme activity 2.2–2.5 times higher than 5M. Intriguingly, the sequences of the improved variants were largely different between the two series, indicating that ML guided the directed evolution to the distinct regions of sequence space depending on the presence/absence of the highly positive variant in the training data. This suggests that the sequence diversity of improved variants can be expanded not only by conventional ML using the whole training data but also by ML using a subset of the training data even when it lacks highly positive variants. In summary, this study demonstrates the importance of regulating the composition of training data in ML-guided directed evolution.
科研通智能强力驱动
Strongly Powered by AbleSci AI