计算机科学
蛋白质组
领域(数学分析)
人类蛋白质组计划
计算生物学
蛋白质组学
比例(比率)
蛋白质结构
机器学习
生物
数据科学
数据挖掘
生物信息学
化学
数学
生物化学
数学分析
物理
量子力学
基因
作者
Kathryn Tunyasuvunakool,Jonas Adler,Zachary Wu,Tim Green,Michał Zieliński,Augustin Žídek,Alex Bridgland,Andrew Cowie,Clemens Meyer,Agata Laydon,Sameer Velankar,Gerard J. Kleywegt,Alex Bateman,K Taki,Alexander Pritzel,Michael Figurnov,Olaf Ronneberger,Russ Bates,Simon Köhl,Anna Potapenko
出处
期刊:Nature
[Springer Nature]
日期:2021-07-22
卷期号:596 (7873): 590-596
被引量:2927
标识
DOI:10.1038/s41586-021-03828-1
摘要
Abstract Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure 1 . Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold 2 , at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.
科研通智能强力驱动
Strongly Powered by AbleSci AI