增强子
计算机科学
学习迁移
编码器
变压器
人工智能
抄写(语言学)
基因
转录因子
生物
遗传学
物理
操作系统
电压
哲学
量子力学
语言学
作者
Hanyu Luo,Libin Chen,Wenyu Shan,Pingjian Ding,Lingyun Luo
标识
DOI:10.1007/978-3-031-13829-4_13
摘要
Enhancers are small segments of DNA that bind to proteins (transcription factors) and the transcription of a gene is strengthened after binding to the protein, thus playing an essential role in gene expression. Recently, machine learning-based methods have become a trend in identifying enhancers and their strength. In this study, we propose iEnhancer-BERT, a novel transfer learning method based on pre-trained DNA language model using the whole human genome. More specifically, iEnhancer-BERT consists of a BERT layer for feature extraction and a CNN layer for classification. We initialize our parameters of the BERT layer using a pre-trained DNA language model, and fine-tune it with transfer learning on the enhancer identification tasks. Unlike common fine-tuning strategies, we extract the output of all Transformer Encoder layers to form the feature vector. Experiments show that our method achieves state-of-the-art results in both enhancer identification tasks and strong enhancer identification tasks. The code and data are publicly available at https://github.com/lhy0322/iEnhancer-BERT .
科研通智能强力驱动
Strongly Powered by AbleSci AI