UniProt公司
百科全书
蛋白质结构域
计算机科学
结构生物信息学
领域(数学分析)
计算生物学
蛋白质折叠
序列(生物学)
集合(抽象数据类型)
蛋白质结构
生物
数学
数学分析
生物化学
图书馆学
基因
遗传学
程序设计语言
作者
Andy M. Lau,Nicola Bordin,Shaun M. Kandathil,Ian Sillitoe,Vaishali Waman,Jude Wells,Christine Orengo,David T. Jones
标识
DOI:10.1101/2024.03.18.585509
摘要
Abstract The AlphaFold Protein Structure Database (AFDB) contains full-length predictions of the three-dimensional structures of almost every protein in UniProt. Because protein function is closely linked to structure, the AFDB is poised to revolutionise our understanding of biology, evolution and more. Protein structures are composed of domains, independently folding units that can be found in multiple structural contexts and functional roles. The AFDB’s potential remains untapped due to the difficulty of characterising 200 million structures. Here we present The Encyclopedia of Domains or TED, which combines state-of-the-art deep learning-based domain parsing and structure comparison algorithms to segment and classify domains across the whole AFDB. TED describes over 370 million domains, over 100 million more than detectable by sequence-based methods. Nearly 80% of TED domains share similarities to known superfamilies in CATH, greatly expanding the set of known protein structural domains. We uncover over 10,000 previously unseen structural interactions between superfamilies, expand domain coverage to over 1 million taxa, and unveil thousands of architectures and folds across the unexplored continuum of protein fold space. We expect TED to be a valuable resource that provides a functional interface to the AFDB, empowering it to be useful for a multitude of downstream analyses.
科研通智能强力驱动
Strongly Powered by AbleSci AI