磷酸化
计算生物学
蛋白质磷酸化
丝氨酸
严重急性呼吸综合征冠状病毒2型(SARS-CoV-2)
2019年冠状病毒病(COVID-19)
生物
生物信息学
人工智能
计算机科学
医学
传染病(医学专业)
疾病
生物化学
蛋白激酶A
病理
作者
Yong Li,Ru Gao,Shan Liu,Hongqi Zhang,Hao Lv,Hongyan Lai
出处
期刊:Methods
[Elsevier]
日期:2024-08-23
卷期号:230: 140-146
标识
DOI:10.1016/j.ymeth.2024.08.004
摘要
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded RNA virus, which mainly causes respiratory and enteric diseases and is responsible for the outbreak of coronavirus disease 19 (COVID-19). Numerous studies have demonstrated that SARS-CoV-2 infection will lead to a significant dysregulation of protein post-translational modification profile in human cells. The accurate recognition of phosphorylation sites in host cells will contribute to a deep understanding of the pathogenic mechanisms of SARS-CoV-2 and also help to screen drugs and compounds with antiviral potential. Therefore, there is a need to develop cost-effective and high-precision computational strategies for specifically identifying SARS-CoV-2-infected phosphorylation sites. In this work, we first implemented a custom neural network model (named PhosBERT) on the basis of a pre-trained protein language model of ProtBert, which was a self-supervised learning approach developed on the Bidirectional Encoder Representation from Transformers (BERT) architecture. PhosBERT was then trained and validated on serine (S) and threonine (T) phosphorylation dataset and tyrosine (Y) phosphorylation dataset with 5-fold cross-validation, respectively. Independent validation results showed that PhosBERT could identify S/T phosphorylation sites with high accuracy and AUC (area under the receiver operating characteristic) value of 81.9% and 0.896. The prediction accuracy and AUC value of Y phosphorylation sites reached up to 87.1% and 0.902. It indicated that the proposed model was of good prediction ability and stability and would provide a new approach for studying SARS-CoV-2 phosphorylation sites.
科研通智能强力驱动
Strongly Powered by AbleSci AI