可转让性
溶解度
堆积
支持向量机
化学
多层感知器
构造(python库)
图形
生物系统
人工智能
计算机科学
生物
有机化学
理论计算机科学
人工神经网络
机器学习
罗伊特
程序设计语言
作者
Hyukjin Kwon,Zhenjiao Du,Yonghui Li
标识
DOI:10.1016/j.ijbiomac.2024.134601
摘要
Accurate protein solubility prediction is crucial in screening suitable candidates for food application. Existing models often rely only on sequences, overlooking important structural details. In this study, a regression model for protein solubility was developed using both the sequences and predicted structures of 2983 E. coli proteins. The sequence and structural level properties of the proteins were bioinformatically extracted and subjected to multilayer perceptron (MLP). Moreover, residue level features and contact maps were utilized to construct a graph convolutional network (GCN). The out-of-fold predictions of the two models were combined and fed into multiple meta-regressors to create a stacking model. The stacking model with support vector regressor (SVR) achieved R2 of 0.502 and 0.468 on test and external validation datasets, respectively, displaying higher performance compared to existing regression models. Based on the improved performance compared to its based models, the stacking model effectively captured the strength of its base models as well as the significance of the different features used. Furthermore, the model's transferability was indirectly validated on a dataset of seed storage proteins using Osborne definition as well as on a case study using molecular dynamic simulation, showing potential for application beyond microbial proteins to food and agriculture-related ones.
科研通智能强力驱动
Strongly Powered by AbleSci AI