基因组
生物
基因
蛋白质结构域
有机体
遗传学
基因组大小
洗牌
进化生物学
计算生物学
数学
统计
作者
David Alvarez‐Ponce,Krishnamurthy Subramanian
标识
DOI:10.1073/pnas.2404332122
摘要
In the pregenomic era, scientists were puzzled by the observation that haploid genome size (the C-value) did not correlate well with organismal complexity. This phenomenon, called the “C-value paradox,” is mostly explained by the fact that protein-coding genes occupy only a small fraction of eukaryotic genomes. When the first genome sequences became available, scientists were even more surprised by the fact that the number of genes (G-value) was also a poor predictor of complexity, which gave rise to the “G-value paradox.” The proposed explanations usually invoke mechanisms that increase the information content of each individual gene (protein–protein interactions, intrinsic disorder, posttranslational modifications, alternative splicing, etc.). Less attention has been paid to mechanisms that increase the amount of genetic material but do not increase (or not to the same extent) the amount of information encoded in the genome, such as gene duplication and domain shuffling. Proteins belonging to the same family and/or sharing the same domains often carry out similar or even redundant functions. We thus hypothesized that an organism’s number of different protein families and domains should be suitable predictors of organismal complexity. In agreement with our hypothesis, we observed that the number of protein families, clans, domains, and motifs increases from simple to progressively more complex organisms. In addition, these metrics correlate with the number of cell types better than and independently of the number of protein-coding genes and several previously proposed predictors of organismal complexity. Our observations have the potential to represent a resolution to the G-value paradox.
科研通智能强力驱动
Strongly Powered by AbleSci AI