鉴定(生物学)
序列(生物学)
5-甲基胞嘧啶
财产(哲学)
化学
算法
计算生物学
计算机科学
生物化学
生物
基因
DNA甲基化
植物
认识论
哲学
基因表达
作者
Lichao Zhang,Xueting Wang,Kang Xiao,Liang Kong
出处
期刊:Letters in Organic Chemistry
[Bentham Science]
日期:2024-01-26
卷期号:21 (8): 695-706
标识
DOI:10.2174/0115701786277281231228093405
摘要
Abstract: N4-methylcytosine (4mC) is one of the most important epigenetic modifications, which plays a significant role in biological progress and helps explain biological functions. Although biological experiments can identify potential 4mC sites, they are limited due to the experimental environment and labor-intensive process. Therefore, it is crucial to construct a computational model to identify the 4mC sites. Some computational methods have been proposed to identify the 4mC sites, but some problems should not be ignored, such as those presented as follows: (1) a more accurate algorithm is required to improve the prediction, especially for Matthew’s correlation coefficient (MCC); (2) easier method is needed for clinical research to design medicine or treat disease. Considering these aspects, an effective algorithm using comprehensible encoding in multiple species was proposed in this study. Since nucleotide arrangement and its property information could reflect the sequence structure and function, several feature vectors have been developed based on nucleotide energy information, trinucleotide energy information, and nucleotide chemical property information. Besides, feature effect has been analyzed to select the optimal feature vectors for multiple species. Finally, the optimal feature vectors were inputted into the CatBoost algorithm to construct the identification model. The evaluation results showed that our study obtained the highest MCC, i.e., 2.5%~11.1%, 1.4%~17.8%, 1.1%~7.6%, and 2.3%~18.0% higher than previous models for the A. thaliana, C. elegans, D. melanogaster, and E. coli datasets, respectively. These satisfactory results reflect that the proposed method is available to identify 4mC sites in multiple species, especially for MCC. It could provide a reasonable supplement for biological research.
科研通智能强力驱动
Strongly Powered by AbleSci AI