序列(生物学)
序列空间
计算机科学
空格(标点符号)
功能(生物学)
适应度函数
人工智能
算法
同源(生物学)
理论计算机科学
机器学习
数学
遗传算法
生物
遗传学
巴拿赫空间
操作系统
生物化学
进化生物学
基因
纯数学
作者
Shuyi Zhang,Zi-yuan Ma,Wenjie Li,Yunhao Shen,Yunxin Xu,Gengjiang Liu,Jiamin Chang,Zeju Li,Hong Qin,Boxue Tian,H. Gong,David R. Liu,B W. Thuronyi,Christopher A. Voigt
出处
期刊:Research Square - Research Square
日期:2024-02-23
被引量:1
标识
DOI:10.21203/rs.3.rs-3930833/v1
摘要
Abstract Designing proteins with improved functions requires a deep understanding of how sequence and function are related, a vast space that is hard to explore. The ability to efficiently compress this space by identifying functionally important features is extremely valuable. Here, we first establish a method called EvoScan to comprehensively segment and scan the high-fitness sequence space to obtain anchor points that capture its essential features, especially in high dimensions. Our approach is compatible with any biomolecular function that can be coupled to a transcriptional output. We then develop deep learning and large language models to accurately reconstruct the space from these anchors, allowing computational prediction of novel, highly fit sequences without prior homology-derived or structural information. We apply this hybrid experimental-computational method, which we call EvoAI, to a repressor protein and find that only 82 anchors are sufficient to compress the high-fitness sequence space with a compression ratio of 10 48 . The extreme compressibility of the space informs both applied biomolecular design and understanding of natural evolution.
科研通智能强力驱动
Strongly Powered by AbleSci AI