外推法
机器学习
适应度函数
健身景观
随机森林
计算机科学
人工智能
数学
遗传算法
人口
数学分析
人口学
社会学
作者
Lin Chen,Zehong Zhang,Zhenghao Li,Rui Li,Ruifeng Huo,Lifan Chen,Dingyan Wang,Xiaomin Luo,Kaixian Chen,Cangsong Liao,Mingyue Zheng
出处
期刊:Cell systems
[Elsevier]
日期:2023-08-01
卷期号:14 (8): 706-721.e5
被引量:4
标识
DOI:10.1016/j.cels.2023.07.003
摘要
One of the key points of machine learning-assisted directed evolution (MLDE) is the accurate learning of the fitness landscape, a conceptual mapping from sequence variants to the desired function. Here, we describe a multi-protein training scheme that leverages the existing deep mutational scanning data from diverse proteins to aid in understanding the fitness landscape of a new protein. Proof-of-concept trials are designed to validate this training scheme in three aspects: random and positional extrapolation for single-variant effects, zero-shot fitness predictions for new proteins, and extrapolation for higher-order variant effects from single-variant effects. Moreover, our study identified previously overlooked strong baselines, and their unexpectedly good performance brings our attention to the pitfalls of MLDE. Overall, these results may improve our understanding of the association between different protein fitness profiles and shed light on developing better machine learning-assisted approaches to the directed evolution of proteins. A record of this paper's transparent peer review process is included in the supplemental information.
科研通智能强力驱动
Strongly Powered by AbleSci AI