秩(图论)
排名(信息检索)
计算机科学
支持向量机
功能(生物学)
人工智能
肽
模式识别(心理学)
数据挖掘
计算生物学
机器学习
数学
生物
遗传学
生物化学
组合数学
作者
Samaneh Azari,Jun Zhang,Bing Xue,Lifeng Peng
标识
DOI:10.1109/cec.2019.8790049
摘要
The analysis of tandem mass spectrometry (MS/MS) proteomics data relies on automated methods that assign peptides to observed MS/MS spectra. Typically these methods return a list of candidate peptide-spectrum matches (PSMs), ranked according to a scoring function. Normally the highest-scoring candidate peptide is considered as the best match for each spectrum. However, these best matches do not necessary always indicate the true matches. Identifying a full-length correct peptide by peptide identification tools is crucial, and we do not want to assign a spectrum to the peptide which is not expressed in the given biological sample. Therefore in this paper, we present a new approach to improving the previous ordering/ranking of the PSMs, aiming at bringing the correct PSM for spectrum ahead of all the incorrect ones for the same spectrum. We develop a new method called GP-PSM-rank, which employs genetic programming (GP) to learn a ranking function by combining different feature functions that measure the quality of PSMs from different perspectives. We compare GP-PSM-rank with SVM-rank. The results show that GP-PSM-rank outperforms SVM-rank in terms of the number of identified peptides which are true matches. On a validation dataset with 120 spectra, the proposed method is used as the post processing step on the results of peptide identifications by two de novo sequencing algorithms. GP-PSM-rank improves the results of both de novo methods in terms of identifying the true matches.
科研通智能强力驱动
Strongly Powered by AbleSci AI