SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins

蓝宝石 计算机科学 鉴定(生物学) 人工智能 特征(语言学) 机器学习 集合(抽象数据类型) 基线(sea) 数据挖掘 算法 海洋学 光学 物理 地质学 哲学 生物 植物 程序设计语言 激光器 语言学
作者
Phasit Charoenkwan,Nalini Schaduangrat,Mohammad Ali Moni,Píetro Lió,Balachandran Manavalan,Watshara Shoombuatong
出处
期刊:Computers in Biology and Medicine [Elsevier]
卷期号:146: 105704-105704 被引量:40
标识
DOI:10.1016/j.compbiomed.2022.105704
摘要

Thermophilic proteins (TPPs) are important in the field of protein biochemistry and development of new enzymes. Thus, computational methods must be urgently developed to accurately and rapidly identify TPPs. To date, several computational methods have been developed for TPP identification; however, few limitations in terms of performance and utility remain. In this study, we present a novel computational method, SAPPHIRE, to achieve more accurate identification of TPPs using only sequence information without any need for structural information. We combined twelve different feature encodings representing different perspectives and six popular machine learning algorithms to train 72 baseline models and extract the key information of TPPs. Subsequently, the informative predicted probabilities from the baseline models were mined and selected using a genetic algorithm in conjunction with a self-assessment-report approach. Finally, the final meta-predictor, SAPPHIRE, was built and optimized by applying an optimal feature set. The performance of SAPPHIRE in the 10-fold cross-validation test showed that a superior predictive performance compared with several baseline models could be achieved. Moreover, SAPPHIRE yielded an accuracy of 0.942 and Matthew's coefficient correlation of 0.884, which were 7.68 and 5.12% higher than those of the current existing methods, respectively, as indicated by the independent test. The proposed computational approach is anticipated to facilitate large-scale identification of TPPs and accelerate their applications in the food industry. The codes and datasets are available at https://github.com/plenoi/SAPPHIRE.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
大个应助虚拟的鞋垫采纳,获得10
刚刚
xyy发布了新的文献求助10
刚刚
科研通AI6.3应助suicone采纳,获得10
1秒前
1秒前
2秒前
汤圆发布了新的文献求助10
2秒前
2秒前
2秒前
细心的火龙果完成签到,获得积分20
3秒前
3秒前
JamesPei应助苹果海白采纳,获得10
4秒前
ZYC发布了新的文献求助10
4秒前
Raymond发布了新的文献求助30
5秒前
veronica完成签到,获得积分10
5秒前
baifeng完成签到,获得积分10
5秒前
5秒前
eve完成签到,获得积分20
5秒前
板蓝根发布了新的文献求助10
6秒前
小鱼发布了新的文献求助10
7秒前
7秒前
赘婿应助白纸采纳,获得10
7秒前
zetero完成签到,获得积分10
8秒前
李健应助安博士采纳,获得10
8秒前
WbinWu完成签到,获得积分10
8秒前
9秒前
10秒前
10秒前
可爱的函函应助马dc采纳,获得10
10秒前
韩跑跑完成签到 ,获得积分10
11秒前
13秒前
5度转角应助猪猪hero采纳,获得10
13秒前
Orange应助猪猪hero采纳,获得10
13秒前
在水一方应助猪猪hero采纳,获得10
13秒前
星辰大海应助猪猪hero采纳,获得10
14秒前
YoungLee发布了新的文献求助10
15秒前
UU发布了新的文献求助10
15秒前
CHEN发布了新的文献求助10
15秒前
Drorix完成签到,获得积分10
15秒前
16秒前
茉橙完成签到,获得积分10
16秒前
高分求助中
Modern Epidemiology, Fourth Edition 5000
Kinesiophobia : a new view of chronic pain behavior 5000
Molecular Biology of Cancer: Mechanisms, Targets, and Therapeutics 3000
Digital Twins of Advanced Materials Processing 2000
Propeller Design 2000
Weaponeering, Fourth Edition – Two Volume SET 2000
Handbook of pharmaceutical excipients, Ninth edition 1500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 化学工程 生物化学 物理 计算机科学 内科学 复合材料 催化作用 物理化学 光电子学 电极 冶金 细胞生物学 基因
热门帖子
关注 科研通微信公众号,转发送积分 6011026
求助须知:如何正确求助?哪些是违规求助? 7558938
关于积分的说明 16135977
捐赠科研通 5157845
什么是DOI,文献DOI怎么找? 2762516
邀请新用户注册赠送积分活动 1741190
关于科研通互助平台的介绍 1633574