非编码RNA
编码
计算机科学
编码器
计算生物学
编码(社会科学)
变压器
人工智能
计算模型
机器学习
生物
小RNA
基因
数学
遗传学
工程类
统计
电压
电气工程
操作系统
作者
Jingpu Zhang,Hao Lu,Jiang Ying,Yuanyuan Ma,Lei Deng
标识
DOI:10.1021/acs.jcim.4c01097
摘要
Many noncoding RNAs (ncRNAs) have been identified, and many of them play vital roles in various biological processes, including gene expression regulation, epigenetic regulation, transcription, and control. Recently, a few observations revealed that ncRNAs are translated into functional peptides. Moreover, many computational methods have been developed to predict the coding potential of these transcripts, which contributes to a deeper investigation of their functions. However, most of these are used to distinguish ncRNAs and mRNAs. It is important to develop a highly accurate computational tool for identifying the coding potential of ncRNAs, thereby contributing to the discovery of novel peptides. In this Article, we propose a novel BiLSTM And Transformer encoder-based model (nBAT) with intrinsic features encoded for ncRNA coding potential prediction. In nBAT, we introduce a learnable position encoding mechanism to better obtain the embeddings of the ncRNA sequence. Moreover, we extract 43 intrinsic features from different perspectives and encode these features into the Transformer encoder by calculating their distances. Our performance comparisons show that nBAT achieves a superior performance than the state-of-the-art methods for coding potential prediction on different datasets. We also apply the method to new ncRNAs for identifying the coding potential, and the results further indicate the competitive performance of nBAT. We expect the method can be exploited as a useful tool for high-throughput coding potential prediction for ncRNAs.
科研通智能强力驱动
Strongly Powered by AbleSci AI