残留物(化学)
机制(生物学)
肽
化学
计算生物学
计算机科学
生物化学
生物
物理
量子力学
作者
Jun Hu,Kaixin Chen,B. Dharma Rao,Maha A. Thafar,Somayah Albaradei,Muhammad Arif
摘要
Accurate identifications of protein-peptide binding residues are essential for protein-peptide interactions and advancing drug discovery. To address this problem, extensive research efforts have been made to design more discriminative feature representations. However, extracting these explicit features usually depend on third-party tools, resulting in low computational efficacy and suffering from low predictive performance. In this study, we design an end-to-end deep learning-based method, E2EPep, for protein-peptide binding residue prediction using protein sequence only. E2EPep first employs and fine-tunes two state-of-the-art pre-trained protein language models that can extract two different high-latent feature representations from protein sequences. A novel feature fusion module is then designed in E2EPep to fuse and optimize the above two feature representations of binding residues. In addition, we have also design E2EPep+, which integrates E2EPep and PepBCL models, to improve the prediction performance. Experimental results on two independent test datasets show that the mean AUC and mean MCC values for E2EPep and E2EPep+ are significantly higher than most existing sequence-based methods and are comparable to state-of-the-art structure-based predictors. Detailed data analysis shows that the primary strength of E2EPep lies in the effectiveness of feature representation using cross-attention mechanism to fuse the embeddings generated by two fine-tuned protein language models.
科研通智能强力驱动
Strongly Powered by AbleSci AI