可扩展性
严重急性呼吸综合征冠状病毒2型(SARS-CoV-2)
计算机科学
工作流程
突变
2019年冠状病毒病(COVID-19)
计算生物学
构造(python库)
2019-20冠状病毒爆发
大流行
加速
生物
遗传学
基因
病毒学
医学
并行计算
数据库
疾病
病理
爆发
传染病(医学专业)
程序设计语言
作者
Jie Chen,Zhiwei Nie,Yu Wang,Kai Wang,Fan Xu,Yaqin Hu,Bing Zheng,Zhennan Wang,Guoli Song,Jingyi Zhang,Jie Fu,Xiansong Huang,Zhongqi Wang,Zhixiang Ren,Qiankun Wang,Daixi Li,Dong‐Qing Wei,Bin Zhou,Changqing Yang,Yonghong Tian
标识
DOI:10.1177/10943420231188077
摘要
The never-ending emergence of SARS-CoV-2 variations of concern (VOCs) has challenged the whole world for pandemic control. In order to develop effective drugs and vaccines, one needs to efficiently simulate SARS-CoV-2 spike receptor-binding domain (RBD) mutations and identify high-risk variants. We pretrain a large protein language model with approximately 408 million protein sequences and construct a high-throughput screening for the prediction of binding affinity and antibody escape. As the first work on SARS-CoV-2 RBD mutation simulation, we successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds. Our workflow scales to 4096 NPUs with 96.5% scalability and 493.9× speedup in mixed-precision computing, while achieving a peak performance of 366.8 PFLOPS (reaching 34.9% theoretical peak) on Pengcheng Cloudbrain-II. Our method paves the way for simulating coronavirus evolution in order to prepare for a future pandemic that will inevitably take place. Our models are released at https://github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation to facilitate future related work.
科研通智能强力驱动
Strongly Powered by AbleSci AI