生物
基因组
计算生物学
基因组学
注释
基因组计划
遗传学
比较基因组学
调节顺序
基因组浏览器
基因组进化
基因
转录因子
作者
Xinkai Zhou,Tao Zhu,Wen Fang,Rongrong Yu,Zhaohui He,Dijun Chen
标识
DOI:10.1016/j.jgg.2022.04.003
摘要
Plant genomes contain a large fraction of non-coding sequences. The discovery and annotation of conserved non-coding sequences (CNSs) in plants is an ongoing challenge. Here we report the application of comparative genomics to systematically identify CNSs in 50 well-annotated Gramineae genomes using rice (Oryza sativa) as the reference. We conduct multiple-way whole-genome alignments to the rice genome. The rice genome is annotated as 20 conservation states (CSs) at single-nucleotide resolution using a multivariate hidden Markov model (ConsHMM) based on the multiple-genome alignments. Different states show distinct enrichments for various genomic features, and the conservation scores of CSs are highly correlated with the level of associated chromatin accessibility. We find that at least 33.5% of the rice genome is highly under selection, with more than 70% of the sequence lying outside of coding regions. A catalog of 855,366 regulatory CNSs is generated, and they significantly overlapped with putative active regulatory elements such as promoters, enhancers, and transcription factor binding sites. Collectively, our study provides a resource for elucidating functional non-coding regions of the rice genome and an evolutionary aspect of regulatory sequences in higher plants.
科研通智能强力驱动
Strongly Powered by AbleSci AI