子序列
编码
嵌入
计算机科学
序列(生物学)
人工智能
RNA结合蛋白
鉴定(生物学)
特征(语言学)
核糖核酸
计算生物学
循环神经网络
深度学习
人工神经网络
模式识别(心理学)
机器学习
基因
生物
数学
遗传学
数学分析
哲学
植物
语言学
有界函数
作者
Xinyi Wang,Mingyang Zhang,Chunlin Long,Lin Yao,Min Zhu
标识
DOI:10.1109/tcbb.2022.3204661
摘要
Proteins binding to Ribonucleic Acid (RNA) inside cells are called RNA-binding proteins (RBP), which play a crucial role in gene regulation. The identification of RNA-protein binding sites helps to understand the function of RBP better. Although many computational methods have been developed to predict RNA-protein binding sites, their prediction accuracy on small sample datasets needs improvement. To overcome this limitation, we propose a novel model called SA-Net, which utilizes k-mer embedding to encode RNA sequences and a self-attention-based neural network to extract sequence features. K-mer embedding assists the model to discover significant subsequence fragments associated with binding sites. The self-attention mechanism captures contextual information from the entire input sequence globally, performing well in small sample sequence learning. Experimental results demonstrate that SA-Net attains state-of-the-art results on the RBP-24 dataset. We find that 4-mer embedding aids the model to achieve optimal performance. We also show that the self-attention network outperforms the commonly used CNN and CNN-BLSTM models in sequence feature extraction.
科研通智能强力驱动
Strongly Powered by AbleSci AI