溶解度
生成语法
欠定系统
人工智能
计算机科学
人工神经网络
机器学习
功能(生物学)
蛋白质测序
生物系统
生化工程
化学
算法
工程类
生物化学
生物
肽序列
有机化学
基因
进化生物学
作者
Han Xi,Liheng Zhang,Kang Zhou,Xiaonan Wang
标识
DOI:10.1016/j.compchemeng.2019.106533
摘要
Protein solubility plays a critical role in improving production yield of recombinant proteins in biocatalysis applications. To some extent, protein solubility can represent the function and activity of biocatalysts which are mainly composed of recombinant proteins. In literature, many machine learning models have been investigated to predict protein solubility from protein sequence, whereas parameters of those models were underdetermined with insufficient data of protein solubility. Here we propose a deep neural network (DNN) as a more accurate regression predictive model. Moreover, to tackle the insufficient data problem, a novel data augmentation algorithm, Protein Solubility Generative Adversarial Nets (ProGAN), was proposed for improving the prediction of protein solubility. After adding mimic data produced from ProGAN, the prediction performance measured by R2 was improved compared with that without data augmentation. A R2 value of 0.4504 was achieved, which was enhanced about 10% compared with the previous study using the same dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI