自编码
计算机科学
适应度函数
人工智能
健身景观
序列空间
变压器
蛋白质测序
序列(生物学)
机器学习
深度学习
遗传算法
数学
生物
肽序列
遗传学
基因
物理
巴拿赫空间
社会学
人口学
量子力学
电压
纯数学
人口
作者
Egbert Castro,Abhinav Godavarthi,Julian Rubinfien,Kevin B. Givechian,Dhananjay Bhaskar,Smita Krishnaswamy
标识
DOI:10.1038/s42256-022-00532-1
摘要
The development of powerful natural language models has improved the ability to learn meaningful representations of protein sequences. In addition, advances in high-throughput mutagenesis, directed evolution and next-generation sequencing have allowed for the accumulation of large amounts of labelled fitness data. Leveraging these two trends, we introduce Regularized Latent Space Optimization (ReLSO), a deep transformer-based autoencoder, which features a highly structured latent space that is trained to jointly generate sequences as well as predict fitness. Through regularized prediction heads, ReLSO introduces a powerful protein sequence encoder and a novel approach for efficient fitness landscape traversal. Using ReLSO, we explicitly model the sequence–function landscape of large labelled datasets and generate new molecules by optimizing within the latent space using gradient-based methods. We evaluate this approach on several publicly available protein datasets, including variant sets of anti-ranibizumab and green fluorescent protein. We observe a greater sequence optimization efficiency (increase in fitness per optimization step) using ReLSO compared with other approaches, where ReLSO more robustly generates high-fitness sequences. Furthermore, the attention-based relationships learned by the jointly trained ReLSO models provide a potential avenue towards sequence-level fitness attribution information. The space of possible proteins is vast, and optimizing proteins for specific target properties computationally is an ongoing challenge, even with large amounts of data. Castro and colleagues combine a transformer-based model with regularized prediction heads to form a smooth and pseudoconvex latent space that allows for easier navigation and more efficient optimization of proteins.
科研通智能强力驱动
Strongly Powered by AbleSci AI