增强子
计算机科学
人工智能
联营
深度学习
模式识别(心理学)
特征(语言学)
嵌入
特征提取
计算生物学
数据挖掘
基因
生物
遗传学
转录因子
哲学
语言学
作者
Hanyu Luo,Ye Li,Liu Huan,Pingjian Ding,Ying Yu,Lingyun Luo
标识
DOI:10.1016/j.compbiolchem.2023.107905
摘要
Super-enhancers are large domains on the genome where multiple short typical enhancers within a specific genomic distance are stitched together. Typically, they are cell type-specific and responsible for defining cell identity and regulating gene transcription. Numerous studies have demonstrated that super-enhancers are enriched for trait-associated variants, and mutations in super-enhancers are possibly related to known diseases. Recently, several machine learning-based methods have been used to distinguish super-enhancers from typical enhancers by using high-throughput data from various experimental methods. The acquisition of such experimental data is usually costly and time-consuming. In this paper, we innovatively proposed SENet, a groundbreaking method based on a deep neural network model, for discriminating between the two categories solely utilizing sequence information. SENet employs dna2vec feature embedding, convolution for local feature extraction, attention pooling for refined feature retention, and Transformer for contextual information extraction. Experiments demonstrate that SENet outperforms all current state-of-the-art computational methods and shows satisfactory performance in cross-species validation. Our method pioneers the distinction between super-enhancers and typical ones using only sequence information. The source code and datasets are stored in https://github.com/lhy0322/SENet.
科研通智能强力驱动
Strongly Powered by AbleSci AI