蛋白质工程
贝叶斯优化
可扩展性
健身景观
计算机科学
进化算法
人工智能
功能(生物学)
定向进化
贝叶斯概率
机器学习
组合优化
替代模型
算法
生物
突变体
数据库
酶
遗传学
生物化学
人口
人口学
社会学
基因
作者
Ruyun Hu,Lihao Fu,Yongcan Chen,Junyu Chen,Yu Qiao,Tong Si
标识
DOI:10.1101/2022.08.11.503535
摘要
ABSTRACT Protein engineering aims to find top functional sequences in a vast design space. For such an expensive “black-box” function optimization problem, Bayesian optimization is a principled sample-efficient approach, which is guided by a surrogate model of the objective function. Unfortunately, Bayesian optimization is computationally intractable with the vast search space. Even worse, it proposes sequences sequentially, making it incompatible with batched wet-lab measurement. Here, we report a scalable and batched method, Bayesian Optimization-guided EVOlutionary (BO-EVO) algorithm, to guide multiple rounds of robotic experiments to explore protein fitness landscapes of combinatorial mutagenesis libraries. We first examined various design specifications based on an empirical landscape of protein G domain B1. Then, BO-EVO was successfully generalized to another empirical landscape of an Escherichia coli kinase PhoQ, as well as simulated NK landscapes with up to moderate epistasis. This approach was then applied to guide robotic library creation and screening to engineer enzyme specificity of RhlA, a key biosynthetic enzyme for rhamnolipid biosurfactants. A 4.8-fold improvement in producing a target rhamnolipid congener was achieved after examining less than 1% of all possible mutants after 4 iterations. Overall, BO-EVO proves to be an efficient and general approach to guide combinatorial protein engineering without prior knowledge.
科研通智能强力驱动
Strongly Powered by AbleSci AI