MLDSPP: Bacterial Promoter Prediction Tool Using DNA Structural Properties with Machine Learning and Explainable AI

可解释性 机器学习 人工智能 发起人 计算生物学 计算机科学 基因组 基因组学 细菌基因组大小 基因 生物 遗传学 基因表达
作者
Subhojit Paul,Kaushika Olymon,Gustavo Sganzerla Martinez,Sharmilee Sarkar,Venkata Rajesh Yella,Aditya Kumar
出处
期刊:Journal of Chemical Information and Modeling [American Chemical Society]
卷期号:64 (7): 2705-2719 被引量:9
标识
DOI:10.1021/acs.jcim.3c02017
摘要

Bacterial promoters play a crucial role in gene expression by serving as docking sites for the transcription initiation machinery. However, accurately identifying promoter regions in bacterial genomes remains a challenge due to their diverse architecture and variations. In this study, we propose MLDSPP (Machine Learning and Duplex Stability based Promoter prediction in Prokaryotes), a machine learning-based promoter prediction tool, to comprehensively screen bacterial promoter regions in 12 diverse genomes. We leveraged biologically relevant and informative DNA structural properties, such as DNA duplex stability and base stacking, and state-of-the-art machine learning (ML) strategies to gain insights into promoter characteristics. We evaluated several machine learning models, including Support Vector Machines, Random Forests, and XGBoost, and assessed their performance using accuracy, precision, recall, specificity, F1 score, and MCC metrics. Our findings reveal that XGBoost outperformed other models and current state-of-the-art promoter prediction tools, namely Sigma70pred and iPromoter2L, achieving F1-scores >95% in most systems. Significantly, the use of one-hot encoding for representing nucleotide sequences complements these structural features, enhancing our XGBoost model's predictive capabilities. To address the challenge of model interpretability, we incorporated explainable AI techniques using Shapley values. This enhancement allows for a better understanding and interpretation of the predictions of our model. In conclusion, our study presents MLDSPP as a novel, generic tool for predicting promoter regions in bacteria, utilizing original downstream sequences as nonpromoter controls. This tool has the potential to significantly advance the field of bacterial genomics and contribute to our understanding of gene regulation in diverse bacterial systems.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
壮观以松完成签到,获得积分10
刚刚
Owen应助花海采纳,获得10
2秒前
OLDBLOW完成签到 ,获得积分10
2秒前
淡然的奎完成签到,获得积分10
3秒前
3秒前
左丘冬寒完成签到,获得积分10
4秒前
gody完成签到,获得积分10
4秒前
溜溜蛋完成签到,获得积分10
5秒前
吃瓜米吃瓜米完成签到 ,获得积分10
5秒前
Crystal完成签到 ,获得积分10
6秒前
7秒前
JamesPei应助蚂蚁Y嘿采纳,获得10
7秒前
失眠的向日葵完成签到 ,获得积分10
9秒前
坦率的之卉完成签到,获得积分20
11秒前
科研通AI6.1应助欣宝采纳,获得10
11秒前
12秒前
12秒前
livra1058完成签到,获得积分10
12秒前
砚木完成签到 ,获得积分10
13秒前
量子星尘发布了新的文献求助10
13秒前
73Jennie123完成签到,获得积分10
14秒前
动力小滋完成签到,获得积分10
14秒前
无名应助离子键采纳,获得20
16秒前
16秒前
16秒前
悲伤的小卷毛完成签到,获得积分10
17秒前
19秒前
rj完成签到,获得积分10
19秒前
19秒前
老实的乐儿完成签到 ,获得积分10
19秒前
20秒前
Owen应助Mayeleven采纳,获得30
24秒前
24秒前
蚂蚁Y嘿发布了新的文献求助10
25秒前
冰蓝色的忧伤完成签到,获得积分10
25秒前
allenice完成签到,获得积分0
26秒前
26秒前
宇宙星河完成签到,获得积分10
29秒前
29秒前
gelinhao完成签到,获得积分0
29秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Introduction to strong mixing conditions volume 1-3 5000
Clinical Microbiology Procedures Handbook, Multi-Volume, 5th Edition 2000
从k到英国情人 1500
Ägyptische Geschichte der 21.–30. Dynastie 1100
„Semitische Wissenschaften“? 1100
Real World Research, 5th Edition 800
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5733271
求助须知:如何正确求助?哪些是违规求助? 5347662
关于积分的说明 15323495
捐赠科研通 4878407
什么是DOI,文献DOI怎么找? 2621220
邀请新用户注册赠送积分活动 1570329
关于科研通互助平台的介绍 1527224