开放式参考框架
基因组
生物
蛋白质组
计算生物学
注释
蛋白质组学
遗传学
基因注释
蛋白质基因组学
基因预测
全基因组测序
基因组计划
基因组学
打开阅读框
基因
肽序列
作者
Jacob D. Jaffe,Howard C. Berg,George M. Church
出处
期刊:Proteomics
[Wiley]
日期:2004-01-01
卷期号:4 (1): 59-77
被引量:345
标识
DOI:10.1002/pmic.200300511
摘要
The accelerated rate of genomic sequencing has led to an abundance of completely sequenced genomes. Annotation of the open reading frames (ORFs) (i.e., gene prediction) in these genomes is an important task and is most often performed computationally based on features in the nucleic acid sequence. Using recent advances in proteomics, we set out to predict the set of ORFs for an organism based principally on expressed protein-based evidence. Using a novel search strategy, we mapped peptides detected in a whole-cell lysate of Mycoplasma pneumoniae onto a genomic scaffold and extended these "hits" into ORFs bound by traditional genetic signals to generate a "proteogenomic map". We were able to generate an ORF model for M. pneumoniae strain FH using proteomic data with a high correlation to models based on sequence features. Ultimately, we detected over 81% of the genomically predicted ORFs in M. pneumoniae strain M129 (the originally sequenced strain). We were also able to detect several new ORFs not originally predicted by genomic methods, various N-terminal extensions, and some evidence that would suggest that certain predicted ORFs are bogus. Some of these differences may be a result of the strain analyzed but demonstrate the robustness of protein analysis across closely related genomes. This technique is a cost-effective means to add value to genome annotation, and a prerequisite for proteome quantitation and in vivo interaction measures.
科研通智能强力驱动
Strongly Powered by AbleSci AI