上位性
点突变
突变
理论(学习稳定性)
水准点(测量)
计算生物学
计算机科学
生物
遗传学
机器学习
基因
大地测量学
地理
作者
Henry Dieckhaus,Brian Kuhlman
标识
DOI:10.1101/2024.08.20.608844
摘要
There is strong interest in accurate methods for predicting changes in protein stability resulting from amino acid mutations to the protein sequence. Recombinant proteins must often be stabilized to be used as therapeutics or reagents, and destabilizing mutations are implicated in a variety of diseases. Due to increased data availability and improved modeling techniques, recent studies have shown advancements in predicting changes in protein stability when a single point mutation is made. Less focus has been directed toward predicting changes in protein stability when there are two or more mutations, despite the significance of mutation clusters for disease pathways and protein design studies. Here, we analyze the largest available dataset of double point mutation stability and benchmark several widely used protein stability models on this and other datasets. We identify a blind spot in how predictors are typically evaluated on multiple mutations, finding that, contrary to assumptions in the field, current stability models are unable to consistently capture epistatic interactions between double mutations. We observe one notable deviation from this trend, which is that epistasis-aware models provide marginally better predictions on stabilizing double point mutations. We develop an extension of the ThermoMPNN framework for double mutant modeling as well as a novel data augmentation scheme which mitigates some of the limitations in available datasets. Collectively, our findings indicate that current protein stability models fail to capture the nuanced epistatic interactions between concurrent mutations due to several factors, including training dataset limitations and insufficient model sensitivity.
科研通智能强力驱动
Strongly Powered by AbleSci AI