生物
预处理器
基因敲除
计算机科学
人工智能
基因沉默
计算生物学
机器学习
感知器
RNA干扰
小干扰RNA
人工神经网络
转染
基因
核糖核酸
遗传学
作者
Jiayu Xu,Nan Xu,Weixin Xie,Chengkui Zhao,Lei Yu,Weixing Feng
出处
期刊:Gene
[Elsevier]
日期:2024-02-01
卷期号:: 148330-148330
标识
DOI:10.1016/j.gene.2024.148330
摘要
Silencing mRNA through siRNA is vital for RNA interference (RNAi), necessitating accurate computational methods for siRNA selection. Current approaches, relying on machine learning, often face challenges with large data requirements and intricate data preprocessing, leading to reduced accuracy. To address this challenge, we propose a BERT model-based siRNA target gene knockdown efficiency prediction method called BERT-siRNA, which consists of a pre-trained DNA-BERT module and Multilayer Perceptron module. It applies the concept of transfer learning to avoid the limitation of a small sample size and the need for extensive preprocessing processes. By fine-tuning on various siRNA datasets after pretraining on extensive genomic data using DNA-BERT to enhance predictive capabilities. Our model clearly outperforms all existing siRNA prediction models through testing on the independent public siRNA dataset. Furthermore, the model’s consistent predictions of high-efficiency siRNA knockdown for SARS-CoV-2, as well as its alignment with experimental results for PDCD1, CD38, and IL6, demonstrate the reliability and stability of the model. In addition, the attention scores for all 19-nt positions in the dataset indicate that the model’s attention is predominantly focused on the 5′ end of the siRNA. The step-by-step visualization of the hidden layer’s classification progressively clarified and explained the effective feature extraction of the MLP layer. The explainability of model by analysis the attention scores and hidden layers is also our main purpose in this work, making it more explainable and reliable for biological researchers.
科研通智能强力驱动
Strongly Powered by AbleSci AI