生物
计算生物学
CpG站点
卷积神经网络
基因组
基因组学
基因
遗传学
序列(生物学)
深度测序
增强子
基因表达
计算机科学
人工智能
DNA甲基化
作者
Vikram Agarwal,Jay Shendure
出处
期刊:Cell Reports
[Elsevier]
日期:2020-05-01
卷期号:31 (7): 107663-107663
被引量:175
标识
DOI:10.1016/j.celrep.2020.107663
摘要
Algorithms that accurately predict gene structure from primary sequence alone were transformative for annotating the human genome. Can we also predict the expression levels of genes based solely on genome sequence? Here, we sought to apply deep convolutional neural networks toward that goal. Surprisingly, a model that includes only promoter sequences and features associated with mRNA stability explains 59% and 71% of variation in steady-state mRNA levels in human and mouse, respectively. This model, termed Xpresso, more than doubles the accuracy of alternative sequence-based models and isolates rules as predictive as models relying on chromatic immunoprecipitation sequencing (ChIP-seq) data. Xpresso recapitulates genome-wide patterns of transcriptional activity, and its residuals can be used to quantify the influence of enhancers, heterochromatic domains, and microRNAs. Model interpretation reveals that promoter-proximal CpG dinucleotides strongly predict transcriptional activity. Looking forward, we propose cell-type-specific gene-expression predictions based solely on primary sequences as a grand challenge for the field.
科研通智能强力驱动
Strongly Powered by AbleSci AI