计算机科学
人工智能
计算生物学
核糖核酸
模式识别(心理学)
生物
遗传学
基因
作者
Niannian Liu,Zequn Zhang,Yanan Wu,Yinglong Wang,Ying Liang
标识
DOI:10.1109/tcbb.2023.3272400
摘要
Circular RNA (CircRNA) is widely expressed and has physiological and pathological significance, regulating post-transcriptional processes via its protein-binding activity. However, whereas much work has been done on linear RNA and RNA binding protein (RBP), little is known about the binding sites of CircRNA. The current report is on the development of a medium-term multimodal data fusion strategy, CRBSP, to predict CircRNA-RBP binding sites. CRBSP represents the CircRNA trinucleotide semantic, location, composition and frequency information as the corresponding coding methods of Word to vector (Word2vec), Position-specific trinucleotide propensity (PSTNP), Pseudo trinucleotide composition (PseTNC) and Trinucleotide nucleotide composition (TNC), respectively. CNN (Convolution Neural Networks) was used to extract global information and BiLSTM (bidirectional Long- and Short-Term Memory network) encoder and LSTM (Long- and Short-Term Memory network) decoder for local sequence information. Enhancement of the contributions of key features by the self-attention mechanism was followed by mid-term fusion of the four enhanced features. Logistic Regression (LR) classifier showed that CRBSP gives a mean AUC value of 0.9362 through 5-fold Cross Validation of all 37 datasets, a performance which is superior to five current state-of-the-art models. Similar evaluation of linear RNA-RBP binding sites gave an AUC value of 0.7615 which is also higher than other prediction methods, demonstrating the robustness of CRBSP.
科研通智能强力驱动
Strongly Powered by AbleSci AI