计算机科学
序列(生物学)
特征(语言学)
人工智能
模式识别(心理学)
亚细胞定位
特征提取
数据挖掘
生物
基因
遗传学
语言学
哲学
作者
Yongxian Fan,Meijun Chen,Qingqi Zhu
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2020-01-01
卷期号:8: 124702-124711
被引量:7
标识
DOI:10.1109/access.2020.3007317
摘要
Determining the subcellular localization of long non-coding RNAs (lncRNAs) provides very favorable references to discover the function of lncRNAs. Instead of through time-consuming and expensive biochemical experiments, we develop a machine learning predictor based on logistic regression, lncLocPred, to predict the subcellular localization of lncRNAs. We adopt sequence features including k-mer, triplet, and PseDNC and systematically process feature selection through VarianceThreshold, binomial distribution, and F-score to obtain representative features. We observe that the top-ranked k-mers have a higher base content of G and C in the form of short repeats. Improving prediction accuracy on several subcellular localizations, our model achieves the highest overall accuracy of 92.37% on the benchmark dataset by jackknife, higher than the existing state-of-the-art predictors. Additionally, lncLocPred performs better for the prediction on an independent dataset collected by us as well. Related experimental data and source code are available at https://github.com/jademyC1221/lncLocPred.
科研通智能强力驱动
Strongly Powered by AbleSci AI