染色质
计算机科学
人工智能
扩散
国家(计算机科学)
序列(生物学)
机器学习
算法
DNA
生物
遗传学
物理
热力学
作者
Yuhang Liu,Zixuan Wang,Jicheng Lv,Yongqing Zhang
标识
DOI:10.1007/978-981-99-8435-0_15
摘要
Chromatin state reflects distinct biological roles of the genome that can systematically characterize regulatory elements and their functional interaction. Despite extensive computational studies, accurate prediction of chromatin state remains a challenge because of the long-tailed class imbalance. Here, we propose a deep-learning framework, DeepChrom, to predict long-tailed chromatin state directly from DNA sequence. The framework includes a diffusion-based model that balances the samples of different classes by generating pseudo-samples and a novel dilated CNN-based model for chromatin state prediction. On top of that, we further develop a novel equalization loss to increase the penalty on generated samples, which alleviates the impact of the bias between ground truth and generated samples. DeepChrom achieves outstanding performance on nine human cell types with our designed paradigm. Specifically, our proposed long-tailed learning strategy surpasses the traditional training method by 0.056 in Acc. To our knowledge, DeepChrom is pioneering in predicting long-tailed chromatin states by the diffusion-based model to achieve sample balance.
科研通智能强力驱动
Strongly Powered by AbleSci AI