化学
鸟枪蛋白质组学
片段(逻辑)
蛋白质组学
碎片(计算)
生物信息学
NIST公司
数据库搜索引擎
计算生物学
人工智能
生物化学
搜索引擎
情报检索
算法
自然语言处理
程序设计语言
计算机科学
生物
基因
作者
Joel Lapin,Xinjian Yan,Qian Dong
标识
DOI:10.1021/acs.analchem.3c02321
摘要
We present UniSpec, an attention-driven deep neural network designed to predict comprehensive collision-induced fragmentation spectra, thereby improving peptide identification in shotgun proteomics. Utilizing a training data set of 1.8 million unique high-quality tandem mass spectra (MS2) from 0.8 million unique peptide ions, UniSpec learned with a peptide fragmentation dictionary encompassing 7919 fragment peaks. Among these, 5712 are neutral loss peaks, with 2310 corresponding to modification-specific neutral losses. Remarkably, UniSpec can predict 73%–77% of fragment intensities based on our NIST reference library spectra, a significant leap from the 35%–45% coverage of only b and y ions. Comparative studies with Prosit elucidate that while both models are strong at predicting their respective fragment ion series, UniSpec particularly shines in generating more complex MS2 spectra with diverse ion annotations. The integration of UniSpec's predictions into shotgun proteomics data analysis boosts the identification rate of tryptic peptides by 48% at a 1% false discovery rate (FDR) and 60% at a more confident 0.1% FDR. Using UniSpec's predicted in-silico spectral library, the search results closely matched those from search engines and experimental spectral libraries used in peptide identification, highlighting its potential as a stand-alone identification tool. The source code and Python scripts are available on GitHub (https://github.com/usnistgov/UniSpec) and Zenodo (https://zenodo.org/records/10452792), and all data sets and analysis results generated in this work were deposited in Zenodo (https://zenodo.org/records/10052268).
科研通智能强力驱动
Strongly Powered by AbleSci AI