可解释性
计算机科学
人工智能
机器学习
语言模型
深度学习
生物信息学
秩(图论)
生物
生物化学
数学
组合数学
基因
作者
Ziyi Zhou,Liang Zhang,Yuanxi Yu,Banghao Wu,Mingchen Li,Liang Hong,Pan Tan
标识
DOI:10.1038/s41467-024-49798-6
摘要
Accurately modeling the protein fitness landscapes holds great importance for protein engineering. Pre-trained protein language models have achieved state-of-the-art performance in predicting protein fitness without wet-lab experimental data, but their accuracy and interpretability remain limited. On the other hand, traditional supervised deep learning models require abundant labeled training examples for performance improvements, posing a practical barrier. In this work, we introduce FSFP, a training strategy that can effectively optimize protein language models under extreme data scarcity for fitness prediction. By combining meta-transfer learning, learning to rank, and parameter-efficient fine-tuning, FSFP can significantly boost the performance of various protein language models using merely tens of labeled single-site mutants from the target protein. In silico benchmarks across 87 deep mutational scanning datasets demonstrate FSFP's superiority over both unsupervised and supervised baselines. Furthermore, we successfully apply FSFP to engineer the Phi29 DNA polymerase through wet-lab experiments, achieving a 25% increase in the positive rate. These results underscore the potential of our approach in aiding AI-guided protein engineering.
科研通智能强力驱动
Strongly Powered by AbleSci AI