计算机科学
序列(生物学)
人工智能
卷积神经网络
模式识别(心理学)
特征(语言学)
编码器
编码(内存)
数据挖掘
计算生物学
理论计算机科学
语言学
遗传学
生物
操作系统
哲学
作者
Wenjing Yin,Shudong Wang,Sibo Qiao,Yuanyuan Zhang,Shanchen Pang
标识
DOI:10.1016/j.engappai.2024.108429
摘要
Circular ribonucleic acids (circRNAs) are single-stranded RNA molecules that form loops and are widely expressed in various cells and tissues. They interact with RNA-binding proteins (RBPs) and play a vital regulatory role in the onset and development of several diseases. Researchers have proposed various hybrid architecture prediction methods based on convolutional neural networks and recurrent neural networks to recognize the interactions and sites between circRNAs and RBPs and thus reveal the biological functions of circRNAs. However, existing methods usually ignore the structural information of circRNA, which may affect the modeling of circRNA and RBP binding modes. To address these problems, we propose a prediction model based on multi-resolution feature extraction. First, it generates sequence features using unsupervised word embedding and nucleotide density. Then, it uses implicit and explicit pseudo-secondary structure hybrid encoding to fuse sequence and structural information and better simulate circRNA-RBP binding patterns. Second, it uses an enhanced bidirectional sample convolution and interaction network encoder to capture and integrate high-order features of distinct resolutions from the multi-scale convolution module. This provides rich semantic input to the downstream bidirectional long short-term memory network to improve prediction accuracy. Experimental results on 37 circRNA and 31 linear RNA datasets show that our method has significant advantages in identifying RNA-RBP interactions. Furthermore, the four motifs learned by our method are verified against existing motif databases, indicating that it can discover biologically meaningful circRNA-RBP binding patterns.
科研通智能强力驱动
Strongly Powered by AbleSci AI