溶解度
优先次序
大肠杆菌
计算机科学
梯度升压
蛋白质测序
Boosting(机器学习)
机器学习
蛋白质表达
试验装置
计算生物学
数据挖掘
数据库
化学
生物
肽序列
生物化学
工程类
随机森林
基因
有机化学
管理科学
作者
Jiří Hon,Martin Marusiak,Tomáš Martínek,Antonín Kunka,Jaroslav Zendulka,David Bednář,Jiřı́ Damborský
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2021-01-01
卷期号:37 (1): 23-28
被引量:72
标识
DOI:10.1093/bioinformatics/btaa1102
摘要
Abstract Motivation Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritization of highly soluble proteins. Results A new tool for sequence-based prediction of soluble protein expression in E.coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProt’s accuracy of 58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility prediction tools. There is also evidence that it could significantly increase the success rate of experimental protein studies. SoluProt is freely available as a standalone program and a user-friendly webserver at https://loschmidt.chemi.muni.cz/soluprot/. Availability and implementation https://loschmidt.chemi.muni.cz/soluprot/. Supplementary information Supplementary data are available at Bioinformatics online.
科研通智能强力驱动
Strongly Powered by AbleSci AI