清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

An Effective Algorithm Based on Sequence and Property Information for N4-methylcytosine Identification in Multiple Species

鉴定(生物学) 序列(生物学) 5-甲基胞嘧啶 财产(哲学) 化学 算法 计算生物学 计算机科学 生物化学 生物 基因 DNA甲基化 植物 认识论 哲学 基因表达
作者
Lichao Zhang,Xueting Wang,Kang Xiao,Liang Kong
出处
期刊:Letters in Organic Chemistry [Bentham Science]
卷期号:21 (8): 695-706
标识
DOI:10.2174/0115701786277281231228093405
摘要

Abstract: N4-methylcytosine (4mC) is one of the most important epigenetic modifications, which plays a significant role in biological progress and helps explain biological functions. Although biological experiments can identify potential 4mC sites, they are limited due to the experimental environment and labor-intensive process. Therefore, it is crucial to construct a computational model to identify the 4mC sites. Some computational methods have been proposed to identify the 4mC sites, but some problems should not be ignored, such as those presented as follows: (1) a more accurate algorithm is required to improve the prediction, especially for Matthew’s correlation coefficient (MCC); (2) easier method is needed for clinical research to design medicine or treat disease. Considering these aspects, an effective algorithm using comprehensible encoding in multiple species was proposed in this study. Since nucleotide arrangement and its property information could reflect the sequence structure and function, several feature vectors have been developed based on nucleotide energy information, trinucleotide energy information, and nucleotide chemical property information. Besides, feature effect has been analyzed to select the optimal feature vectors for multiple species. Finally, the optimal feature vectors were inputted into the CatBoost algorithm to construct the identification model. The evaluation results showed that our study obtained the highest MCC, i.e., 2.5%~11.1%, 1.4%~17.8%, 1.1%~7.6%, and 2.3%~18.0% higher than previous models for the A. thaliana, C. elegans, D. melanogaster, and E. coli datasets, respectively. These satisfactory results reflect that the proposed method is available to identify 4mC sites in multiple species, especially for MCC. It could provide a reasonable supplement for biological research.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
15秒前
爱静静举报孔明不在空城求助涉嫌违规
38秒前
善学以致用应助无限语海采纳,获得10
42秒前
1分钟前
Tttttttt完成签到,获得积分10
1分钟前
fighting发布了新的文献求助10
1分钟前
就是我完成签到,获得积分10
1分钟前
淡淡醉波wuliao完成签到 ,获得积分10
2分钟前
2分钟前
无限语海发布了新的文献求助10
3分钟前
3分钟前
无限语海完成签到,获得积分10
3分钟前
3分钟前
春日迟迟2012完成签到 ,获得积分10
4分钟前
4分钟前
5分钟前
紫熊发布了新的文献求助10
5分钟前
5分钟前
5分钟前
紫熊完成签到,获得积分10
6分钟前
Malmever完成签到,获得积分10
6分钟前
6分钟前
6分钟前
方白秋完成签到,获得积分10
6分钟前
lbl发布了新的文献求助10
6分钟前
Richard完成签到 ,获得积分10
6分钟前
小二郎应助lbl采纳,获得10
7分钟前
7分钟前
7分钟前
7分钟前
桐桐应助迅速的不正采纳,获得10
8分钟前
8分钟前
8分钟前
9分钟前
研究材料的12年枪迷完成签到,获得积分10
9分钟前
迅速的不正完成签到,获得积分10
9分钟前
严珍珍完成签到 ,获得积分10
9分钟前
9分钟前
简单双组完成签到,获得积分10
9分钟前
10分钟前
高分求助中
Evolution 10000
Sustainability in Tides Chemistry 2800
юрские динозавры восточного забайкалья 800
English Wealden Fossils 700
An Introduction to Geographical and Urban Economics: A Spiky World Book by Charles van Marrewijk, Harry Garretsen, and Steven Brakman 500
Diagnostic immunohistochemistry : theranostic and genomic applications 6th Edition 500
Chen Hansheng: China’s Last Romantic Revolutionary 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3150617
求助须知:如何正确求助?哪些是违规求助? 2802025
关于积分的说明 7846089
捐赠科研通 2459372
什么是DOI,文献DOI怎么找? 1309219
科研通“疑难数据库(出版商)”最低求助积分说明 628708
版权声明 601757