克莱德
水准点(测量)
聚类分析
人工智能
计算机科学
采样(信号处理)
无监督学习
机器学习
生物
系统发育树
基因
遗传学
大地测量学
计算机视觉
滤波器(信号处理)
地理
标识
DOI:10.1021/acs.jcim.2c01046
摘要
Directed evolution, a revolutionary biotechnology in protein engineering, optimizes protein fitness by searching an astronomical mutational space via expensive experiments. The cluster learning-assisted directed evolution (CLADE) efficiently explores the mutational space via a combination of unsupervised hierarchical clustering and supervised learning. However, the initial-stage sampling in CLADE treats all clusters equally despite many clusters containing a large portion of non-functional mutations. Recent statistical and deep learning tools enable evolutionary density modeling to access protein fitness in an unsupervised manner. In this work, we construct an ensemble of multiple evolutionary scores to guide the initial sampling in CLADE. The resulting evolutionary score-enhanced CLADE, called CLADE 2.0, efficiently selects a training set within a small informative space using the evolution-driven clustering sampling. CLADE 2.0 is validated by using two benchmark libraries both having 160,000 sequences from four-site mutational combinations. Extensive computational experiments and comparisons with existing cutting-edge methods indicate that CLADE 2.0 is a new state-of-art tool for machine learning-assisted directed evolution.
科研通智能强力驱动
Strongly Powered by AbleSci AI