计算机科学
推论
人工智能
卷积神经网络
深度学习
编码(集合论)
DNA
机器学习
人工神经网络
源代码
航程(航空)
模式识别(心理学)
生物
复合材料
集合(抽象数据类型)
材料科学
程序设计语言
操作系统
遗传学
作者
Lei Cheng,Tong Yu,Ruslan Khalitov,Zhirong Yang
标识
DOI:10.1016/j.neunet.2023.12.002
摘要
DNA molecules commonly exhibit wide interactions between the nucleobases. Modeling the interactions is important for obtaining accurate sequence-based inference. Although many deep learning methods have recently been developed for modeling DNA sequences, they still suffer from two major issues: 1) most existing methods can handle only short DNA fragments and fail to capture long-range information; 2) current methods always require massive supervised labels, which are hard to obtain in practice. We propose a new method to address both issues. Our neural network employs circular dilated convolutions as building blocks in the backbone. As a result, our network can take long DNA sequences as input without any condensation. We also incorporate the neural network into a self-supervised learning framework to capture inherent information in DNA without expensive supervised labeling. We have tested our model in two DNA inference tasks, the human variant effect and the open chromatin region of plants, where the experimental results show that our method outperforms five other deep learning models. Our code is available at https://github.com/wiedersehne/cdilDNA.
科研通智能强力驱动
Strongly Powered by AbleSci AI