计算机科学
序列数据库
数据库搜索引擎
序列(生物学)
蛋白质测序
串联质谱法
吉祥物
数据挖掘
数据库索引
肽序列
搜索引擎
质谱法
人工智能
情报检索
搜索引擎索引
化学
基因
生物化学
法学
色谱法
政治学
作者
Changyong Yu,Guoren Wang,Yuhai Zhao,Keming Mao,Junjie Wu,Wendan Zhai
摘要
Large-scale, rapid and accurate protein identification is the crucial basis for further protein analysis in computational proteomics. Searching protein database by use of the protein tandem mass spectra has been a standard solution for solving this problem. Though several algorithms have been proposed, more sensitive and accurate approaches are still needed. In this paper, an effective database search approach is proposed. Prior to searching sequence database, an approach based on a graph-theoretic model is proposed to infer the peptide sequence tag (PST) from the tandem mass spectra data which is the partial sequence of the peptide. Also, an index approach for the protein sequence database is proposed for speeding up the database search and filtering out the incorrect protein sequences. Then, a novel scoring method for evaluating the match between the peptide sequence tag and the protein sequence is proposed for improving the accuracy of the database search result. Finally, we develop an algorithm for solving the problem and implement it as a computer program PepCheck. All the results fore-Check are compared with those of the famous algorithms. Experimental results demonstrate that PepCheck is as accurate as or more accurate than them with the test datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI