特征选择
计算机科学
核糖核酸
计算生物学
梯度升压
数据挖掘
机器学习
生物
基因
随机森林
生物化学
作者
Wei Su,Xue-Qin Xie,Xiaowei Liu,Dong Gao,Cai-Yi Ma,Hasan Zulfiqar,Hui Yang,Hao Lin,Xiaolong Yu,Yanwen Li
标识
DOI:10.1016/j.ijbiomac.2022.11.299
摘要
RNA N4-acetylcytidine (ac4C) is the acetylation of cytidine at the nitrogen-4 position, which is a highly conserved RNA modification and involves a variety of biological processes. Hence, accurate identification of genome-wide ac4C sites is vital for understanding regulation mechanism of gene expression. In this work, a novel predictor, named iRNA-ac4C, was established to identify ac4C sites in human mRNA based on three feature extraction methods, including nucleotide composition, nucleotide chemical property, and accumulated nucleotide frequency. Subsequently, minimum-Redundancy-Maximum-Relevance combined with incremental feature selection strategies was utilized to select the optimal feature subset. According to the optimal feature subset, the best ac4C classification model was trained by gradient boosting decision tree with 10-fold cross-validation. The results of independent testing set indicated that our proposed method could produce encouraging generalization capabilities. For the convenience of other researchers, we established a user-friendly web server which is freely available at http://lin-group.cn/server/iRNA-ac4C/. We hope that the tool could provide guide for wet-experimental scholars.
科研通智能强力驱动
Strongly Powered by AbleSci AI