同源建模
结构基因组学
蛋白质结构域
序列比对
结构相似性
结构母题
计算生物学
结构线形
蛋白质数据库的结构分类
蛋白质超家族
同源(生物学)
蛋白质家族
蛋白质结构
多序列比对
系统发育树
UniProt公司
生物
肽序列
遗传学
生物化学
氨基酸
基因
酶
作者
Jimin Pei,Antonina Andreeva,Sara Chuguransky,Beatriz Lázaro,Typhaine Paysan-Lafosse,R. Dustin Schaeffer,Alex Bateman,Qian Cong,Nick V. Grishin
标识
DOI:10.1016/j.jmb.2024.168764
摘要
Classification of protein domains based on homology and structural similarity serves as a fundamental tool to gain biological insights into protein function. Recent advancements in protein structure prediction, exemplified by AlphaFold, have revolutionized the availability of protein structural data. We focus on classifying about 9000 Pfam families into ECOD (Evolutionary Classification of Domains) by using predicted AlphaFold models and the DPAM (Domain Parser for AlphaFold Models) tool. Our results offer insights into their homologous relationships and domain boundaries. More than half of these Pfam families contain DPAM domains that can be confidently assigned to the ECOD hierarchy. Most assigned domains belong to highly populated folds such as Immunoglobulin-like (IgL), Armadillo (ARM), helix-turn-helix (HTH), and Src homology 3 (SH3). A large fraction of DPAM domains, however, cannot be confidently assigned to ECOD homologous groups. These unassigned domains exhibit statistically different characteristics, including shorter average length, fewer secondary structure elements, and more abundant transmembrane segments. They could potentially define novel families remotely related to domains with known structures or novel superfamilies and folds. Manual scrutiny of a subset of these domains revealed an abundance of internal duplications and recurring structural motifs. Exploring sequence and structural features such as disulfide bond patterns, metal-binding sites, and enzyme active sites helped uncover novel structural folds as well as remote evolutionary relationships. By bridging the gap between sequence-based Pfam and structure-based ECOD domain classifications, our study contributes to a more comprehensive understanding of the protein universe by providing structural and functional insights into previously uncharacterized proteins.
科研通智能强力驱动
Strongly Powered by AbleSci AI