非负矩阵分解
亚细胞定位
鉴定(生物学)
刀切重采样
编码
源代码
矩阵分解
分类器(UML)
特征向量
计算生物学
支持向量机
计算机科学
人工智能
模式识别(心理学)
生物
数学
物理
遗传学
基因
植物
操作系统
统计
特征向量
估计员
细胞质
量子力学
作者
Shengli Zhang,Huijuan Qiao
标识
DOI:10.1016/j.ab.2020.113995
摘要
Long non-coding RNAs (lncRNAs) refer to functional RNA molecules with a length more than 200 nucleotides and have minimal or no function to encode proteins. In recent years, more studies show that lncRNAs subcellular localization has valuable clues for their biological functions. So it is count for much to identify lncRNAs subcellular localization. In this paper, a novel statistical model named KD-KLNMF is constructed to predict lncRNAs subcellular localization. Firstly, k-mer and dinucleotide-based spatial autocorrelation are incorporated as the feature vector. Then, Synthetic Minority Over-sampling Technique is used to deal with the imbalance dataset. Next, Kullback-Leibler divergence-based nonnegative matrix factorization is applied to select optimal features. And then we utilize support vector machine as the classifier after comparing with other classifiers. Finally, the jackknife test is performed to evaluate the model. The overall accuracies reach 97.24% and 92.86% on training dataset and independent dataset, respectively. The results are better than the previous methods, which indicate that our model will be a useful and feasible tool to identify lncRNAs subcellular localization. The datasets and source code are freely available at https://github.com/HuijuanQiao/KD-KLNMF.
科研通智能强力驱动
Strongly Powered by AbleSci AI