计算机科学
人工智能
亚细胞定位
卷积神经网络
机器学习
深度学习
生物
基因
生物化学
作者
Haibin Liu,Dianguo Li,Hao Wu
出处
期刊:IEEE Journal of Biomedical and Health Informatics
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:28 (1): 538-547
标识
DOI:10.1109/jbhi.2023.3324709
摘要
Recent studies have highlighted the critical roles of long non-coding RNAs (lncRNAs) in various biological processes, including but not limited to dosage compensation, epigenetic regulation, cell cycle regulation, and cell differentiation regulation. Consequently, lncRNAs have emerged as a central focus in genetic studies. The identification of the subcellular localization of lncRNAs is essential for gaining insights into crucial information about lncRNA interaction partners, post- or co-transcriptional regulatory modifications, and external stimuli that directly impact the function of lncRNA. Computational methods have emerged as a promising avenue for predicting the subcellular localization of lncRNAs. However, there is a need for additional enhancement in the performance of current methods when dealing with unbalanced data sets. To address this challenge, we propose a novel ensemble deep learning framework, termed lncLocator-imb, for predicting the subcellular localization of lncRNAs. To fully exploit lncRNA sequence information, lncLocator-imb integrates two base classifiers, including convolutional neural networks (CNN) and gated recurrent units (GRU). Additionally, it incorporates two distinct types of features, including the physicochemical pattern feature and the distributed representation of nucleic acids feature. To address the problem of poor performance exhibited by models when confronted with unbalanced data sets, we utilize the label-distribution-aware margin (LDAM) loss function during the training process. Compared with traditional machine learning models and currently available predictors, lncLocator-imb demonstrates more robust category imbalance tolerance. Our study proposes an ensemble deep learning framework for predicting the subcellular localization of lncRNAs. Additionally, a novel approach is presented for the management of different features and the resolution of unbalanced data sets. The proposed framework exhibits the potential to serve as a significant resource for various sequence-based prediction tasks, providing a versatile tool that can be utilized by professionals in the fields of bioinformatics and genetics.
科研通智能强力驱动
Strongly Powered by AbleSci AI