采样(信号处理)
无监督学习
计算机科学
生物信息学
序列空间
序列(生物学)
层次聚类
监督学习
蛋白质测序
聚类分析
人工智能
机器学习
生物
人工神经网络
数学
肽序列
基因
遗传学
生物化学
滤波器(信号处理)
巴拿赫空间
纯数学
计算机视觉
作者
Yuchi Qiu,Jian Hu,Guo‐Wei Wei
标识
DOI:10.1038/s43588-021-00168-y
摘要
Directed evolution, a strategy for protein engineering, optimizes protein properties (that is, fitness) by expensive and time-consuming screening or selection of a large mutational sequence space. Machine learning-assisted directed evolution (MLDE), which screens sequence properties in silico, can accelerate the optimization and reduce the experimental burden. This work introduces an MLDE framework, cluster learning-assisted directed evolution (CLADE), which combines hierarchical unsupervised clustering sampling and supervised learning to guide protein engineering. The clustering sampling selectively picks and screens variants in targeted subspaces, which guides the subsequent generation of diverse training sets. In the last stage, accurate predictions via supervised learning models improve the final outcomes. By sequentially screening 480 sequences out of 160,000 in a four-site combinatorial library with five equal experimental batches, CLADE achieves global maximal fitness hit rates of up to 91.0% and 34.0% for the GB1 and PhoQ datasets, respectively, improved from the values of 18.6% and 7.2% obtained by random sampling-based MLDE. A machine learning-assisted directed evolution method is developed, combining hierarchical unsupervised clustering and supervised learning, to guide protein engineering by iteratively exploring the large mutational sequence space.
科研通智能强力驱动
Strongly Powered by AbleSci AI