马修斯相关系数
遗传程序设计
试验装置
人工智能
集成学习
蛋白质测序
计算机科学
肽
计算生物学
机器学习
随机森林
基因表达程序设计
肽序列
支持向量机
基因
化学
生物
生物化学
作者
Shima Shafiee,Abdolhossein Fathi,Ghazaleh Taherzadeh
标识
DOI:10.1109/tcbb.2022.3230540
摘要
Peptide-binding proteins play significant roles in various applications such as gene expression, metabolism, signal transmission, DNA (Deoxyribose Nucleic Acid) repair, and replication. Investigating the binding residues in protein-peptide complexes, especially from their sequence only, is challenging experimentally and computationally. Although several computational approaches have been introduced to determine and predict these binding residues, there is still ample room to improve the prediction performance. In this work, we introduce a novel ensemble machine learning-based approach called SPPPred (Sequence-based Protein-Peptide binding residue Prediction) to predict protein-peptide binding residues. First, we extract relevant sequential information and employ genetic programming algorithm for feature construction to find more distinctive features. We then, in the next step, build an ensemble-based machine learning classifier to predict binding residues. The proposed method shows consistent and comparable performance on both ten-fold cross-validation and independent test set. Furthermore, SPPPred yields F-Measure (F-M), Accuracy(ACC), and Matthews’ Correlation Coefficient (MCC) of 0.310, 0.949, and 0.230 on the independent test set, respectively, which outperforms other competing methods by approximately up to 9% on the independent test set. SPPPred is publicly available https://github.com/GTaherzadeh/SPPPred.git .
科研通智能强力驱动
Strongly Powered by AbleSci AI