智人
亚细胞定位
计算生物学
集合(抽象数据类型)
计算机科学
生物
机器学习
人工智能
基因
遗传学
人类学
社会学
程序设计语言
作者
Zhao‐Yue Zhang,Zheng Zhang,Xiucai Ye,Tetsuya Sakurai,Hao Lin
标识
DOI:10.1016/j.ijbiomac.2024.130659
摘要
Understanding the subcellular localization of lncRNAs is crucial for comprehending their regulation activities. The conventional detection of lncRNA subcellular location usually uses in situ detection techniques, which are resource intensive. Some machine learning-based algorithms have been proposed for lncRNA subcellular location prediction in mammals. However, due to the low level of conservation of lncRNA sequence, the performance of cross-species models remains unsatisfactory. In this study, we curated a novel dataset containing subcellular location information of lncRNAs in Homo sapiens. Subsequently, based on the BERT pre-trained language algorithm, we developed a model for lncRNA subcellular location prediction. Our model achieved a micro-average area under the receiver operating characteristic (AUROC) of 0.791 on the training set and an AUROC of 0.700 on the testing nucleus set. Additionally, we conducted cross-species validation and motif discovery to further investigate underlying patterns. In summary, our study provides valuable guidance and computational analysis tools for exploring the mechanisms of lncRNA subcellular localization and the dynamic spatial changes of RNA in abnormal physiological states.
科研通智能强力驱动
Strongly Powered by AbleSci AI