计算机科学
工作流程
管道(软件)
定向进化
功能(生物学)
蛋白质测序
序列(生物学)
蛋白质工程
计算生物学
数据挖掘
生物
数据库
肽序列
基因
遗传学
程序设计语言
突变体
生物化学
酶
作者
Yueming Long,Ariane Mora,Francesca-Zhoufan Li,Emre Gürsoy,Kadina E. Johnston,Frances H. Arnold
标识
DOI:10.1021/acssynbio.4c00625
摘要
Sequence-function data provides valuable information about the protein functional landscape but is rarely obtained during directed evolution campaigns. Here, we present Long-read every variant Sequencing (LevSeq), a pipeline that combines a dual barcoding strategy with nanopore sequencing to rapidly generate sequence-function data for entire protein-coding genes. LevSeq integrates into existing protein engineering workflows and comes with open-source software for data analysis and visualization. The pipeline facilitates data-driven protein engineering by consolidating sequence-function data to inform directed evolution and provide the requisite data for machine learning-guided protein engineering (MLPE). LevSeq enables quality control of mutagenesis libraries prior to screening, which reduces time and resource costs. Simulation studies demonstrate LevSeq's ability to accurately detect variants under various experimental conditions. Finally, we show LevSeq's utility in engineering protoglobins for new-to-nature chemistry. Widespread adoption of LevSeq and sharing of the data will enhance our understanding of protein sequence-function landscapes and empower data-driven directed evolution.
科研通智能强力驱动
Strongly Powered by AbleSci AI