Transformer-based protein generation with regularized latent space optimization

自编码计算机科学适应度函数人工智能健身景观序列空间变压器蛋白质测序序列（生物学）机器学习深度学习遗传算法数学生物肽序列遗传学基因物理巴拿赫空间社会学人口学量子力学电压纯数学人口

作者

Egbert Castro,Abhinav Godavarthi,Julian Rubinfien,Kevin B. Givechian,Dhananjay Bhaskar,Smita Krishnaswamy

出处

期刊：Nature Machine Intelligence [Nature Portfolio]
日期：2022-09-26 卷期号：4 (10): 840-851 被引量：42

链接

nature.comdoi.org

标识

DOI：10.1038/s42256-022-00532-1

摘要

The development of powerful natural language models has improved the ability to learn meaningful representations of protein sequences. In addition, advances in high-throughput mutagenesis, directed evolution and next-generation sequencing have allowed for the accumulation of large amounts of labelled fitness data. Leveraging these two trends, we introduce Regularized Latent Space Optimization (ReLSO), a deep transformer-based autoencoder, which features a highly structured latent space that is trained to jointly generate sequences as well as predict fitness. Through regularized prediction heads, ReLSO introduces a powerful protein sequence encoder and a novel approach for efficient fitness landscape traversal. Using ReLSO, we explicitly model the sequence–function landscape of large labelled datasets and generate new molecules by optimizing within the latent space using gradient-based methods. We evaluate this approach on several publicly available protein datasets, including variant sets of anti-ranibizumab and green fluorescent protein. We observe a greater sequence optimization efficiency (increase in fitness per optimization step) using ReLSO compared with other approaches, where ReLSO more robustly generates high-fitness sequences. Furthermore, the attention-based relationships learned by the jointly trained ReLSO models provide a potential avenue towards sequence-level fitness attribution information. The space of possible proteins is vast, and optimizing proteins for specific target properties computationally is an ongoing challenge, even with large amounts of data. Castro and colleagues combine a transformer-based model with regularized prediction heads to form a smooth and pseudoconvex latent space that allows for easier navigation and more efficient optimization of proteins.

求助该文献

最长约 10秒，即可获得该文献文件

Transformer-based protein generation with regularized latent space optimization

今日热心研友