5-甲基胞嘧啶
计算生物学
表观遗传学
机器学习
鉴定(生物学)
计算机科学
核糖核酸
人类疾病
人工智能
生物信息学
生物
遗传学
基因
DNA甲基化
基因表达
植物
作者
Hiroyuki Kurata,Md. Harun-Or-Roshid,Md Mehedi Hasan,Sho Tsukiyama,Kazuhiro Maeda,Balachandran Manavalan
出处
期刊:Methods
[Elsevier]
日期:2024-07-01
卷期号:227: 37-47
标识
DOI:10.1016/j.ymeth.2024.05.004
摘要
RNA modification serves as a pivotal component in numerous biological processes. Among the prevalent modifications, 5-methylcytosine (m5C) significantly influences mRNA export, translation efficiency and cell differentiation and are also associated with human diseases, including Alzheimer's disease, autoimmune disease, cancer, and cardiovascular diseases. Identification of m5C is critically responsible for understanding the RNA modification mechanisms and the epigenetic regulation of associated diseases. However, the large-scale experimental identification of m5C present significant challenges due to labor intensity and time requirements. Several computational tools, using machine learning, have been developed to supplement experimental methods, but identifying these sites lack accuracy and efficiency. In this study, we introduce a new predictor, MLm5C, for precise prediction of m5C sites using sequence data. Briefly, we evaluated eleven RNA sequence-derived features with four basic machine learning algorithms to generate baseline models. From these 44 models, we ranked them based on their performance and subsequently stacked the Top 20 baseline models as the best model, named MLm5C. The MLm5C outperformed the-state-of-the-art predictors. Notably, the optimization of the sequence length surrounding the modification sites significantly improved the prediction performance. MLm5C is an invaluable tool in accelerating the detection of m5C sites within the human genome, thereby facilitating in the characterization of their roles in post-transcriptional regulation.
科研通智能强力驱动
Strongly Powered by AbleSci AI