计算机科学
图形
嵌入
数据挖掘
校准
机器学习
人工智能
理论计算机科学
数学
统计
作者
Hongmei Wang,Long Zhao,Ziyuan Yu,Ximin Zeng,Shaoping Shi
标识
DOI:10.1002/pmic.202400210
摘要
ABSTRACT N‐Linked glycosylation is crucial for various biological processes such as protein folding, immune response, and cellular transport. Traditional experimental methods for determining N‐linked glycosylation sites entail substantial time and labor investment, which has led to the development of computational approaches as a more efficient alternative. However, due to the limited availability of 3D structural data, existing prediction methods often struggle to fully utilize structural information and fall short in integrating sequence and structural information effectively. Motivated by the progress of protein pretrained language models (pLMs) and the breakthrough in protein structure prediction, we introduced a high‐accuracy model called CoNglyPred. Having compared various pLMs, we opt for the large‐scale pLM ESM‐2 to extract sequence embeddings, thus mitigating certain limitations associated with manual feature extraction. Meanwhile, our approach employs a graph transformer network to process the 3D protein structures predicted by AlphaFold2. The final graph output and ESM‐2 embedding are intricately integrated through a co‐attention mechanism. Among a series of comprehensive experiments on the independent test dataset, CoNglyPred outperforms state‐of‐the‐art models and demonstrates exceptional performance in case study. In addition, we are the first to report the uncertainty of N‐linked glycosylation predictors using expected calibration error and expected uncertainty calibration error.
科研通智能强力驱动
Strongly Powered by AbleSci AI