MLDSPP: Bacterial Promoter Prediction Tool Using DNA Structural Properties with Machine Learning and Explainable AI

可解释性 机器学习 人工智能 发起人 计算生物学 计算机科学 基因组 基因组学 细菌基因组大小 基因 生物 遗传学 基因表达
作者
Subhojit Paul,Kaushika Olymon,Gustavo Sganzerla Martinez,Sharmilee Sarkar,Venkata Rajesh Yella,Aditya Kumar
出处
期刊:Journal of Chemical Information and Modeling [American Chemical Society]
卷期号:64 (7): 2705-2719 被引量:5
标识
DOI:10.1021/acs.jcim.3c02017
摘要

Bacterial promoters play a crucial role in gene expression by serving as docking sites for the transcription initiation machinery. However, accurately identifying promoter regions in bacterial genomes remains a challenge due to their diverse architecture and variations. In this study, we propose MLDSPP (Machine Learning and Duplex Stability based Promoter prediction in Prokaryotes), a machine learning-based promoter prediction tool, to comprehensively screen bacterial promoter regions in 12 diverse genomes. We leveraged biologically relevant and informative DNA structural properties, such as DNA duplex stability and base stacking, and state-of-the-art machine learning (ML) strategies to gain insights into promoter characteristics. We evaluated several machine learning models, including Support Vector Machines, Random Forests, and XGBoost, and assessed their performance using accuracy, precision, recall, specificity, F1 score, and MCC metrics. Our findings reveal that XGBoost outperformed other models and current state-of-the-art promoter prediction tools, namely Sigma70pred and iPromoter2L, achieving F1-scores >95% in most systems. Significantly, the use of one-hot encoding for representing nucleotide sequences complements these structural features, enhancing our XGBoost model's predictive capabilities. To address the challenge of model interpretability, we incorporated explainable AI techniques using Shapley values. This enhancement allows for a better understanding and interpretation of the predictions of our model. In conclusion, our study presents MLDSPP as a novel, generic tool for predicting promoter regions in bacteria, utilizing original downstream sequences as nonpromoter controls. This tool has the potential to significantly advance the field of bacterial genomics and contribute to our understanding of gene regulation in diverse bacterial systems.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
jiajia发布了新的文献求助10
2秒前
多多发SCI完成签到,获得积分10
2秒前
自由的无色完成签到 ,获得积分10
2秒前
翁雁丝完成签到 ,获得积分10
3秒前
活泼平凡完成签到,获得积分10
3秒前
小知了完成签到,获得积分10
4秒前
5秒前
xzy998应助科研通管家采纳,获得10
5秒前
Akjan应助科研通管家采纳,获得10
5秒前
wmm20035完成签到,获得积分10
5秒前
如意竺完成签到,获得积分10
6秒前
snow完成签到,获得积分10
8秒前
CHSLN完成签到 ,获得积分10
9秒前
qin完成签到,获得积分10
11秒前
爱丽丝应助leo采纳,获得10
13秒前
清秀龙猫完成签到 ,获得积分10
14秒前
bingo完成签到,获得积分10
20秒前
youngyang完成签到 ,获得积分10
20秒前
Salt完成签到 ,获得积分10
22秒前
Nicole完成签到 ,获得积分10
22秒前
爱笑半雪完成签到,获得积分10
24秒前
1122完成签到 ,获得积分10
24秒前
震动的沉鱼完成签到 ,获得积分10
25秒前
濮阳盼曼完成签到,获得积分10
26秒前
刘清河完成签到 ,获得积分10
26秒前
我是125完成签到,获得积分10
27秒前
和谐曼凝完成签到 ,获得积分10
28秒前
凌晨五点的完成签到,获得积分10
29秒前
重要铃铛完成签到 ,获得积分10
31秒前
csg888888完成签到,获得积分10
31秒前
32秒前
deallyxyz完成签到,获得积分10
33秒前
科研通AI2S应助Robe采纳,获得10
34秒前
善学以致用应助洁净斑马采纳,获得10
36秒前
36秒前
Urusaiina完成签到,获得积分10
38秒前
杨杨杨完成签到,获得积分10
38秒前
wanghua完成签到,获得积分10
41秒前
燕子完成签到,获得积分10
41秒前
caoyulongchn完成签到,获得积分10
42秒前
高分求助中
【提示信息,请勿应助】关于scihub 10000
A new approach to the extrapolation of accelerated life test data 1000
Coking simulation aids on-stream time 450
北师大毕业论文 基于可调谐半导体激光吸收光谱技术泄漏气体检测系统的研究 390
Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 370
Robot-supported joining of reinforcement textiles with one-sided sewing heads 360
Novel Preparation of Chitin Nanocrystals by H2SO4 and H3PO4 Hydrolysis Followed by High-Pressure Water Jet Treatments 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 4015708
求助须知:如何正确求助?哪些是违规求助? 3555661
关于积分的说明 11318291
捐赠科研通 3288879
什么是DOI,文献DOI怎么找? 1812301
邀请新用户注册赠送积分活动 887882
科研通“疑难数据库(出版商)”最低求助积分说明 812027