序列(生物学)
计算生物学
计算机科学
肽
分类器(UML)
人工智能
PDZ域
蛋白质结构预测
机器学习
蛋白质结构
数据挖掘
生物
遗传学
生物化学
作者
Amir Motmaen,Justas Dauparas,Minkyung Baek,Mohamad H. Abedi,David Baker,Philip Bradley
标识
DOI:10.1073/pnas.2216697120
摘要
Peptide-binding proteins play key roles in biology, and predicting their binding specificity is a long-standing challenge. While considerable protein structural information is available, the most successful current methods use sequence information alone, in part because it has been a challenge to model the subtle structural changes accompanying sequence substitutions. Protein structure prediction networks such as AlphaFold model sequence-structure relationships very accurately, and we reasoned that if it were possible to specifically train such networks on binding data, more generalizable models could be created. We show that placing a classifier on top of the AlphaFold network and fine-tuning the combined network parameters for both classification and structure prediction accuracy leads to a model with strong generalizable performance on a wide range of Class I and Class II peptide-MHC interactions that approaches the overall performance of the state-of-the-art NetMHCpan sequence-based method. The peptide-MHC optimized model shows excellent performance in distinguishing binding and non-binding peptides to SH3 and PDZ domains. This ability to generalize well beyond the training set far exceeds that of sequence-only models and should be particularly powerful for systems where less experimental data are available.
科研通智能强力驱动
Strongly Powered by AbleSci AI