蛋白质设计
计算生物学
计算机科学
序列比对
嵌入
蛋白质测序
蛋白质工程
功能(生物学)
生物
协议(科学)
序列(生物学)
蛋白质结构
遗传学
肽序列
人工智能
基因
医学
生物化学
病理
酶
替代医学
作者
Giulia Sormani,Zander Harteveld,Stéphane Rosset,Bruno E. Correia,Alessandro Laio
摘要
Computational protein design has emerged as a powerful tool capable of identifying sequences compatible with pre-defined protein structures. The sequence design protocols, implemented in the Rosetta suite, have become widely used in the protein engineering community. To understand the strengths and limitations of the Rosetta design framework, we tested several design protocols on two distinct folds (SH3-1 and Ubiquitin). The sequence optimization, when started from native structures and natural sequences or polyvaline sequences, converges to sequences that are not recognized as belonging to the fold family of the target protein by standard bioinformatic tools, such as BLAST and Hmmer. The sequences generated from both starting conditions (native and polyvaline) are instead very similar to each other and recognized by Hmmer as belonging to the same “family.” This demonstrates the capability of Rosetta to converge to similar sequences, even when sampling from distinct starting conditions, but, on the other hand, shows intrinsic inaccuracy of the scoring function that drifts toward sequences that lack identifiable natural sequence signatures. To address this problem, we developed a protocol embedding Rosetta Design simulations in a genetic algorithm, in which the sequence search is biased to converge to sequences that exist in nature. This protocol allows us to obtain sequences that have recognizable natural sequence signatures and, experimentally, the designed proteins are biochemically well behaved and thermodynamically stable.
科研通智能强力驱动
Strongly Powered by AbleSci AI