瓜氨酸化
计算机科学
特征选择
支持向量机
人工智能
特征(语言学)
特征提取
模式识别(心理学)
序列(生物学)
数据挖掘
机器学习
瓜氨酸
生物
哲学
生物化学
氨基酸
遗传学
精氨酸
语言学
作者
Lína Zhang,Jin‐Gui Chen,Chengjin Zhang,Rui Gao,Runtao Yang
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2020-01-01
卷期号:8: 88453-88463
被引量:2
标识
DOI:10.1109/access.2020.2992672
摘要
As one of post-translational modifications (PTMs), protein citrullination is crucial in a diverse array of cellular processes and implicated in a slew of human pathology. Therefore, accurate identification of protein citrullination sites (PCSs) is urgently needed to illuminate the reaction details and the complex pathogenesis related to the protein citrullination. In view of the limitations of the existing PCS predictors, this study proposes a novel and powerful sequence-based combined method named PCSPred_SC to further enhance the prediction performance. Various feature extraction methods are developed to mine sequence-derived biological information. Under the feature space, the predictive capabilities of different prediction algorithms, over-sampling methods, and feature selection methods are respectively explored. Experimental results indicate that the over-sampling methods are effective to solve the imbalanced dataset problem and the feature selection methods are significant in removing irrelevant and redundant features. On the same dataset using 10-fold cross validation, PCSPred_SC constructed by the combination of support vector machine (SVM), Adasyn, and t-distributed stochastic neighbor embedding (t-SNE) achieves much more outstanding performance than the competing methods, while reducing the number of features used for this task remarkably. It is anticipated that the proposed method will provide significant information to broaden our knowledge of citrullination-related biological processes.
科研通智能强力驱动
Strongly Powered by AbleSci AI