FCCCSR_Glu: a semi-supervised learning model based on FCCCSR algorithm for prediction of glutarylation sites

分类器（UML）计算机科学人工智能聚类分析芯（光纤）伪氨基酸组成 k-最近邻算法模式识别（心理学）鉴定（生物学）算法机器学习氨基酸化学生物电信生物化学植物二肽

作者

Ning Qiao,Zedong Qi,Yue Wang,Ansheng Deng,Chen Chen

出处

期刊：Briefings in Bioinformatics [Oxford University Press]
日期：2022-09-27 卷期号：23 (6) 被引量：4

链接

nih.govdoi.org

标识

DOI：10.1093/bib/bbac421

摘要

Abstract Glutarylation is a post-translational modification which plays an irreplaceable role in various functions of the cell. Therefore, it is very important to accurately identify the glutarylation substrates and its corresponding glutarylation sites. In recent years, many computational methods of glutarylation sites have emerged one after another, but there are still many limitations, among which noisy data and the class imbalance problem caused by the uncertainty of non-glutarylation sites are great challenges. In this study, we propose a new semi-supervised learning algorithm, named FCCCSR, to identify reliable non-glutarylation lysine sites from unlabeled samples as negative samples. FCCCSR first finds core objects from positive samples according to reverse nearest neighbor information, and then clusters core objects based on natural neighbor structure. Finally, reliable negative samples are selected according to clustering result. With FCCCSR algorithm, we propose a new method named FCCCSR_Glu for glutarylation sites identification. In this study, multi-view features are extracted and fused to describe peptides, including amino acid composition, BLOSUM62, amino acid factors and composition of k-spaced amino acid pairs. Then, reliable negative samples selected by FCCCSR and positive samples are combined to establish models and XGBoost optimized by differential evolution algorithm is used as the classifier. On the independent testing dataset, FCCCSR_Glu achieves 85.18%, 98.36%, 94.31% and 0.8651 in sensitivity, specificity, accuracy and Matthew’s Correlation Coefficient, respectively, which is superior to state-of-the-art methods in predicting glutarylation sites. Therefore, FCCCSR_Glu can be a useful tool for glutarylation sites prediction and FCCCSR algorithm can effectively select reliable negative samples from unlabeled samples. The data and code are available on https://github.com/xbbxhbc/FCCCSR_Glu.git

求助该文献

最长约 10秒，即可获得该文献文件

FCCCSR_Glu: a semi-supervised learning model based on FCCCSR algorithm for prediction of glutarylation sites

今日热心研友