上位性
序列(生物学)
计算生物学
突变
突变
序列空间
生物
选择(遗传算法)
遗传学
蛋白质测序
蛋白质超家族
序列比对
计算机科学
肽序列
基因
人工智能
数学
纯数学
巴拿赫空间
作者
Simona Cocco,Lorenzo Posani,Rémi Monasson
标识
DOI:10.1073/pnas.2312335121
摘要
Predicting the effects of one or more mutations to the in vivo or in vitro properties of a wild-type protein is a major computational challenge, due to the presence of epistasis, that is, of interactions between amino acids in the sequence. We introduce a computationally efficient procedure to build minimal epistatic models to predict mutational effects by combining evolutionary (homologous sequence) and few mutational-scan data. Mutagenesis measurements guide the selection of links in a sparse graphical model, while the parameters on the nodes and the edges are inferred from sequence data. We show, on 10 mutational scans, that our pipeline exhibits performances comparable to state-of-the-art deep networks trained on many more data, while requiring much less parameters and being hence more interpretable. In particular, the identified interactions adapt to the wild-type protein and to the fitness or biochemical property experimentally measured, mostly focus on key functional sites, and are not necessarily related to structural contacts. Therefore, our method is able to extract information relevant for one mutational experiment from homologous sequence data reflecting the multitude of structural and functional constraints acting on proteins throughout evolution.
科研通智能强力驱动
Strongly Powered by AbleSci AI