计算机科学
机器学习
人工智能
可解释性
表观遗传学
瓶颈
公制(单位)
稳健性(进化)
二元分类
概化理论
数据挖掘
支持向量机
统计
数学
运营管理
经济
DNA甲基化
基因
嵌入式系统
化学
基因表达
生物化学
作者
Shushan Toneyan,Ziqi Tang,Peter K. Koo
标识
DOI:10.1101/2022.04.29.490059
摘要
ABSTRACT Deep learning has been successful at predicting epigenomic profiles from DNA sequences. Most approaches frame this task as a binary classification relying on peak callers to define functional activity. Recently, quantitative models have emerged to directly predict the experimental coverage values as a regression. As new models continue to emerge with different architectures and training configurations, a major bottleneck is forming due to the lack of ability to fairly assess the novelty of proposed models and their utility for downstream biological discovery. Here we introduce a unified evaluation framework and use it to compare various binary and quantitative models trained to predict chromatin accessibility data. We highlight various modeling choices that affect generalization performance, including a downstream application of predicting variant effects. In addition, we introduce a robustness metric that can be used to enhance model selection and improve variant effect predictions. Our empirical study largely supports that quantitative modeling of epigenomic profiles leads to better generalizability and interpretability.
科研通智能强力驱动
Strongly Powered by AbleSci AI