MulTFBS: A Spatial-Temporal Network with Multichannels for Predicting Transcription Factor Binding Sites

计算机科学 DNA结合位点嵌入卷积神经网络人工智能深度学习 DNA微阵列转录因子编码（内存）文字嵌入序列（生物学） k-mer公司数据挖掘 DNA测序模式识别（心理学） DNA 基因发起人生物遗传学基因表达

作者

Jujuan Zhuang,Xinru Huang,Shuhan Liu,Wanquan Gao,Rui Su,Kexin Feng

出处

期刊：Journal of Chemical Information and Modeling [American Chemical Society]
日期：2024-05-11 卷期号：64 (10): 4322-4333

链接

nih.govdoi.org

标识

DOI：10.1021/acs.jcim.3c02088

摘要

Revealing the mechanisms that influence transcription factor binding specificity is the key to understanding gene regulation. In previous studies, DNA double helix structure and one-hot embedding have been used successfully to design computational methods for predicting transcription factor binding sites (TFBSs). However, DNA sequence as a kind of biological language, the method of word embedding representation in natural language processing, has not been considered properly in TFBS prediction models. In our work, we integrate different types of features of DNA sequence to design a multichanneled deep learning framework, namely MulTFBS, in which independent one-hot encoding, word embedding encoding, which can incorporate contextual information and extract the global features of the sequences, and double helix three-dimensional structural features have been trained in different channels. To extract sequence high-level information effectively, in our deep learning framework, we select the spatial-temporal network by combining convolutional neural networks and bidirectional long short-term memory networks with attention mechanism. Compared with six state-of-the-art methods on 66 universal protein-binding microarray data sets of different transcription factors, MulTFBS performs best on all data sets in the regression tasks, with the average R2 of 0.698 and the average PCC of 0.833, which are 5.4% and 3.2% higher, respectively, than the suboptimal method CRPTS. In addition, we evaluate the classification performance of MulTFBS for distinguishing bound or unbound regions on TF ChIP-seq data. The results show that our framework also performs well in the TFBS classification tasks.

求助该文献

最长约 10秒，即可获得该文献文件

MulTFBS: A Spatial-Temporal Network with Multichannels for Predicting Transcription Factor Binding Sites

今日热心研友