苏氨酸
计算生物学
交叉验证
蛋白质结构预测
计算机科学
分类器(UML)
人工智能
生物
蛋白质结构
遗传学
生物化学
磷酸化
丝氨酸
作者
Hua Tang,Qiang Tang,Qian Zhang,Pengmian Feng
标识
DOI:10.1016/j.ijbiomac.2023.124761
摘要
O-linked glycosylation is one of the most complex post-translational modifications (PTM) of human proteins modulating various cellular metabolic and signaling pathways. Unlike N-glycosylation, the O-glycosylation has non-specific sequence features and unstable glycan core structure, which makes identification of O-glycosites more challenging either by experimental or computational methods. Biochemical experiments to identify O-glycosites in batches are technically and economically demanding. Therefore, development of computation-based methods is greatly warranted. This study constructed a prediction model based on feature fusion for O-glycosites linked to the threonine residues in Homo sapiens. In the training model, we collected and sorted out high-quality human protein data with O-linked threonine glycosites. Seven feature coding methods were fused to represent the sample sequence. By comparison of different algorithms, random forest was selected as the final classifier to construct the classification model. Through 5-fold cross-validation, the proposed model, namely O-GlyThr, performed satisfactorily on both training set (AUC: 0.9308) and independent validation dataset (AUC: 0.9323). Compared with previously published predictors, O-GlyThr achieved the highest ACC of 0.8475 on the independent test dataset. These results demonstrated the high competency of our predictor in identifying O-glycosites on threonine residues. Furthermore, a user-friendly webserver named O-GlyThr (http://cbcb.cdutcm.edu.cn/O-GlyThr/) was developed to assist glycobiologists in the research associated with glycosylation structure and function.
科研通智能强力驱动
Strongly Powered by AbleSci AI