生物
基因
基因表达
遗传学
拟南芥
龙葵
调节顺序
基因表达调控
计算生物学
基因组
植物
突变体
作者
Fritz Forbang Peleke,Simon Zumkeller,Mehmet Gültaş,Armin O. Schmitt,Jędrzej Szymański
标识
DOI:10.1038/s41467-024-47744-0
摘要
Elucidating the relationship between non-coding regulatory element sequences and gene expression is crucial for understanding gene regulation and genetic variation. We explored this link with the training of interpretable deep learning models predicting gene expression profiles from gene flanking regions of the plant species Arabidopsis thaliana, Solanum lycopersicum, Sorghum bicolor, and Zea mays. With over 80% accuracy, our models enabled predictive feature selection, highlighting e.g. the significant role of UTR regions in determining gene expression levels. The models demonstrated remarkable cross-species performance, effectively identifying both conserved and species-specific regulatory sequence features and their predictive power for gene expression. We illustrated the application of our approach by revealing causal links between genetic variation and gene expression changes across fourteen tomato genomes. Lastly, our models efficiently predicted genotype-specific expression of key functional gene groups, exemplified by underscoring known phenotypic and metabolic differences between Solanum lycopersicum and its wild, drought-resistant relative, Solanum pennellii.
科研通智能强力驱动
Strongly Powered by AbleSci AI