人工智能
计算机科学
支持向量机
机器学习
随机森林
判别式
多层感知器
Lasso(编程语言)
特征选择
人工神经网络
堆积
分类器(UML)
模式识别(心理学)
化学
有机化学
万维网
作者
Bin Yu,Xue Wang,Yaqun Zhang,Hongli Gao,Yifei Wang,Yushuang Liu,Xin Gao
标识
DOI:10.1016/j.asoc.2022.108676
摘要
RNA–protein interactions (RPI) play a crucial role in foundational cellular physiological processes. Traditional methods to predict RPI are implemented through expensive and labor-intensive biological experiments, and existing computational methods are far from being satisfactory. There is a timely need for developing more cost-effective methods to predict RPI. A stacking ensemble deep learning-based framework (named RPI-MDLStack) is constructed for RPI prediction in this study. First, sequential-, physicochemical-, structural- and evolutionary-information from RNA and protein sequences are obtained through eight feature extraction methods. Then, the optimal feature is generated after eliminating the redundancy of the fusion features by the least absolute shrinkage and selection operator (LASSO). Based on the stacking strategy, the optimal feature is first learned by the base-classifier combination composed of multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), gated recurrent unit (GRU), and deep neural networks (DNN). Finally, the prediction scores are fed into a discriminative model for further training. The results of 5-fold cross-validation test prove the superior identification of RPI-MDLStack with accuracy of 96.7%, 87.3%, 94.6%, 97.1% and 89.5% on RPI488, RPI369, RPI2241, RPI1807, and RPI1446, respectively. Additionally, RPI-MDLStack obtained the overall prediction accuracy of 97.8% in the independent tests trained on RPI488. Compared with other state-of-the-art RPI prediction methods using the same datasets, RPI-MDLStack shows more robust and stable for predicting RPI.
科研通智能强力驱动
Strongly Powered by AbleSci AI