计算机科学
人工智能
特征(语言学)
机器学习
深度学习
加权
特征学习
卷积神经网络
数据挖掘
语言学
医学
放射科
哲学
作者
Long Chen,Rining Wu,Feixiang Zhou,Qiang Zhang,Jian K. Liu
标识
DOI:10.1186/s13321-023-00788-8
摘要
Abstract The solubility of proteins stands as a pivotal factor in the realm of pharmaceutical research and production. Addressing the imperative to enhance production efficiency and curtail experimental costs, the demand arises for computational models adept at accurately predicting solubility based on provided datasets. Prior investigations have leveraged deep learning models and feature engineering techniques to distill features from raw protein sequences for solubility prediction. However, these methodologies have not thoroughly delved into the interdependencies among features or their respective magnitudes of significance. This study introduces HybridGCN, a pioneering Hybrid Graph Convolutional Network that elevates solubility prediction accuracy through the combination of diverse features, encompassing sophisticated deep-learning features and classical biophysical features. An exploration into the intricate interplay between deep-learning features and biophysical features revealed that specific biophysical attributes, notably evolutionary features, complement features extracted by advanced deep-learning models. Augmenting the model’s capability for feature representation, we employed ESM, a substantial protein language model, to derive a zero-shot learning feature capturing comprehensive and pertinent information concerning protein functions and structures. Furthermore, we proposed a novel feature fusion module termed Adaptive Feature Re-weighting (AFR) to integrate multiple features, thereby enabling the fine-tuning of feature importance. Ablation experiments and comparative analyses attest to the efficacy of the HybridGCN approach, culminating in state-of-the-art performances on the public eSOL and S. cerevisiae datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI