德布鲁因图
计算机科学
编码
德布鲁恩序列
图形
卷积神经网络
深度学习
人工智能
模式识别(心理学)
理论计算机科学
生物
数学
遗传学
基因
组合数学
作者
Min Li,Baoying Zhao,Rui Yin,Chengqian Lu,Fei Guo,Min Zeng
摘要
The subcellular localization of long non-coding RNAs (lncRNAs) is crucial for understanding lncRNA functions. Most of existing lncRNA subcellular localization prediction methods use k-mer frequency features to encode lncRNA sequences. However, k-mer frequency features lose sequence order information and fail to capture sequence patterns and motifs of different lengths. In this paper, we proposed GraphLncLoc, a graph convolutional network-based deep learning model, for predicting lncRNA subcellular localization. Unlike previous studies encoding lncRNA sequences by using k-mer frequency features, GraphLncLoc transforms lncRNA sequences into de Bruijn graphs, which transforms the sequence classification problem into a graph classification problem. To extract the high-level features from the de Bruijn graph, GraphLncLoc employs graph convolutional networks to learn latent representations. Then, the high-level feature vectors derived from de Bruijn graph are fed into a fully connected layer to perform the prediction task. Extensive experiments show that GraphLncLoc achieves better performance than traditional machine learning models and existing predictors. In addition, our analyses show that transforming sequences into graphs has more distinguishable features and is more robust than k-mer frequency features. The case study shows that GraphLncLoc can uncover important motifs for nucleus subcellular localization. GraphLncLoc web server is available at http://csuligroup.com:8000/GraphLncLoc/.
科研通智能强力驱动
Strongly Powered by AbleSci AI