信号肽
假阳性悖论
线程(蛋白质序列)
信号(编程语言)
回路建模
蛋白质组
过境(卫星)
蛋白质测序
蛋白质结构
蛋白质结构预测
肽序列
生物
计算生物学
生物信息学
生物化学
计算机科学
人工智能
基因
程序设计语言
法学
政治学
公共交通
作者
Venkata R. Sanaboyana,Adrian H. Elcock
标识
DOI:10.1016/j.jmb.2023.168393
摘要
Many proteins contain cleavable signal or transit peptides that direct them to their final subcellular locations. Such peptides are usually predicted from sequence alone using methods such as TargetP 2.0 and SignalP 6.0. While these methods are usually very accurate, we show here that an analysis of a protein's AlphaFold2-predicted structure can often be used to identify false positive predictions. We start by showing that when given a protein’s full-length sequence, AlphaFold2 builds experimentally annotated signal and transit peptides in orientations that point away from the main body of the protein. This indicates that AlphaFold2 correctly identifies that a signal is not destined to be part of the mature protein’s structure and suggests, as a corollary, that predicted signals that AlphaFold2 folds with high confidence into the main body of the protein are likely to be false positives. To explore this idea, we analyzed predicted signal peptides in 48 proteomes made available in DeepMind’s AlphaFold2 database (https://alphafold.ebi.ac.uk). Applying TargetP 2.0 and SignalP 6.0 to the 561,562 proteins in the database results in 95,236 being predicted to contain a cleavable signal or transit peptide. In 95.1% of these cases, the AlphaFold2 structure of the full-length protein is fully consistent with the prediction of TargetP 2.0 or SignalP 6.0. In the remaining 4.9% of cases where the AlphaFold2 structure does not appear consistent with the prediction, the signal is often only predicted with low confidence. The potential false positives identified here may be useful for training even more accurate signal prediction methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI