DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning

蛋白酵素 化学 蛋白质组学 胰蛋白酶 糜蛋白酶 蛋白水解酶 计算生物学 鸟枪蛋白质组学 劈理(地质) 生物化学 人工智能 机器学习 计算机科学 生物 古生物学 基因 断裂(地质)
作者
Jinghan Yang,Zhiqiang Gao,Xiuhan Ren,Jie Sheng,Ping Xu,Cheng Chang,Yan Fu
出处
期刊:Analytical Chemistry [American Chemical Society]
卷期号:93 (15): 6094-6103 被引量:28
标识
DOI:10.1021/acs.analchem.0c04704
摘要

Proteolytic digestion of proteins by one or more proteases is a key step in shotgun proteomics, in which the proteolytic products, i.e., peptides, are taken as the surrogates of their parent proteins for further qualitative or quantitative analysis. The proteases generally cleave proteins at specific amino acid residue sites, but digestion is hardly complete (wide existence of missed cleavage sites). Therefore, it would be of great help to improve the prior experimental design and the posterior data analysis if the digestion behaviors of proteases can be accurately modeled and predicted. At present, systematic studies about the commonly used proteases in proteomics are insufficient, and there is a lack of easy-to-use tools to predict the cleavage sites of different proteases. Here, we propose a novel sequence-based deep learning algorithm—DeepDigest, which integrates convolutional neural networks and long short-term memory networks for protein digestion prediction. DeepDigest can predict the cleavage probability of each potential cleavage site on the protein sequences for eight popular proteases including trypsin, ArgC, chymotrypsin, GluC, LysC, AspN, LysN, and LysargiNase. We compared DeepDigest with three traditional machine learning algorithms, i.e., logistic regression, random forest, and support vector machine. On the eight training data sets, the 10-fold cross-validation accuracies (AUCs) of DeepDigest were 0.956–0.982, significantly higher than those of the three traditional algorithms. On the 11 independent test data sets, DeepDigest achieved AUCs between 0.849 and 0.978, outperforming the other traditional algorithms in most cases. Transfer learning then further improved the prediction accuracy. Besides, some interesting characteristics of different proteases were revealed and discussed. Ultimately, as an application, we used DeepDigest to predict the digestibilities of peptides and demonstrated that peptide digestibility is an informative new feature to discriminate between correct and incorrect peptide identifications.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
fineloby发布了新的文献求助10
1秒前
fanfan发布了新的文献求助10
2秒前
科研通AI5应助鞘皮采纳,获得10
2秒前
天天快乐应助sda采纳,获得10
2秒前
childe完成签到,获得积分10
2秒前
苏木235完成签到 ,获得积分10
2秒前
哈尔行者发布了新的文献求助10
2秒前
2秒前
今后应助哼哼今采纳,获得10
2秒前
4秒前
1233完成签到,获得积分10
4秒前
4秒前
5秒前
yar应助星之茧采纳,获得10
5秒前
思源应助xiaojiezhang采纳,获得10
5秒前
斯文败类应助yo1nang采纳,获得10
5秒前
zht完成签到,获得积分10
6秒前
haoran433完成签到,获得积分10
6秒前
7秒前
杳鸢应助gfdfg采纳,获得10
7秒前
开放的斌完成签到,获得积分10
7秒前
无情寒荷完成签到,获得积分10
8秒前
8秒前
Bey完成签到,获得积分10
9秒前
10秒前
阿萨大大发布了新的文献求助10
10秒前
韦小艺发布了新的文献求助10
10秒前
千堆雪发布了新的文献求助30
10秒前
Andorchid完成签到,获得积分10
11秒前
jiaojiao发布了新的文献求助10
11秒前
11秒前
limof完成签到,获得积分10
11秒前
11秒前
AAA郭哥汽修完成签到,获得积分10
12秒前
electronic发布了新的文献求助10
13秒前
工位瘤子完成签到,获得积分10
13秒前
13秒前
yo1nang发布了新的文献求助10
14秒前
14秒前
顾矜应助桃子采纳,获得10
14秒前
高分求助中
Continuum thermodynamics and material modelling 3000
Production Logging: Theoretical and Interpretive Elements 2500
Healthcare Finance: Modern Financial Analysis for Accelerating Biomedical Innovation 2000
Applications of Emerging Nanomaterials and Nanotechnology 1111
Les Mantodea de Guyane Insecta, Polyneoptera 1000
Theory of Block Polymer Self-Assembly 750
지식생태학: 생태학, 죽은 지식을 깨우다 700
热门求助领域 (近24小时)
化学 医学 材料科学 生物 工程类 有机化学 生物化学 纳米技术 内科学 物理 化学工程 计算机科学 复合材料 基因 遗传学 物理化学 催化作用 细胞生物学 免疫学 电极
热门帖子
关注 科研通微信公众号,转发送积分 3473664
求助须知:如何正确求助?哪些是违规求助? 3066242
关于积分的说明 9097543
捐赠科研通 2757303
什么是DOI,文献DOI怎么找? 1512843
邀请新用户注册赠送积分活动 699164
科研通“疑难数据库(出版商)”最低求助积分说明 698843