Protein encoder: An autoencoder-based ensemble feature selection scheme to predict protein secondary structure

特征选择 自编码 计算机科学 模式识别(心理学) 随机森林 人工智能 特征(语言学) 单变量 数据挖掘 机器学习 人工神经网络 多元统计 语言学 哲学
作者
Uzma,Usama Manzoor,Zahid Halim
出处
期刊:Expert Systems With Applications [Elsevier]
卷期号:213: 119081-119081 被引量:9
标识
DOI:10.1016/j.eswa.2022.119081
摘要

Proteins play a vital role in the human body as they perform important metabolic tasks. Experimental identification of protein structure is expensive and time consuming. The prediction of protein secondary structure is significant to identify the protein tertiary structure and its folds. The feature subset selection from high dimensional protein primary sequence is a key to improve the accuracy of Protein Secondary Structure Prediction (PSSP). Therefore, it is essential to select the relevant features from high dimensional data to predict the protein secondary structure. This work presents a novel method for the PSSP problem based on a two-phase feature selection technique. The first stage utilizes an unsupervised autoencoder for feature extractions. Whereas, the second stage is an ensemble of three feature selection methods, namely, generic univariate select, recursive feature elimination, and Pearson's correlation. This phase combines multiple feature subsets using mutual information to select the optimum feature subset. For classification, different resultant subset features are used. These include random forest, decision tree, and multilayer perceptron. Two sets of experiments are performed on five datasets for the assessment of proposed work. The proposed solution is compared with three state-of-the-art methods based on Q3 accuracy, Q8 accuracy, and segment overlap score. Obtained results show that the proposed framework performs better in the majority of the cases than the past contributions. The proposed framework achieves Q8 accuracies of 82%, 80%, 79%, 73%, and 74% and Q3 accuracies of 90%, 90%, 92%, 79%, and 74% on CB6133, CB6133-filtered, CB513, CASP10, and CASP11 datasets, respectively.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Ava应助nine2652采纳,获得10
刚刚
刚刚
Realovery完成签到,获得积分10
刚刚
英姑应助Zzzz采纳,获得10
1秒前
情怀应助Zzzz采纳,获得10
1秒前
小鱼努力学习完成签到,获得积分10
1秒前
李爱国应助黑眼圈采纳,获得10
2秒前
qqqq_8完成签到,获得积分10
2秒前
呆萌的源智完成签到,获得积分10
2秒前
舒适的梦玉完成签到,获得积分10
2秒前
Buddha发布了新的文献求助10
2秒前
Jasper应助大气思柔采纳,获得10
2秒前
oooaini发布了新的文献求助10
3秒前
王括完成签到,获得积分10
3秒前
Akim应助环游世界采纳,获得10
3秒前
hzk完成签到,获得积分10
3秒前
量子星尘发布了新的文献求助10
3秒前
Harvey3568发布了新的文献求助10
3秒前
酢浆草小熊完成签到 ,获得积分10
3秒前
李健应助cloud采纳,获得10
4秒前
4秒前
鲤鱼霸发布了新的文献求助10
4秒前
wind完成签到 ,获得积分10
4秒前
4秒前
4秒前
赘婿应助syn采纳,获得10
5秒前
5秒前
5秒前
enochc发布了新的文献求助30
6秒前
Les_Touls完成签到 ,获得积分10
7秒前
JM完成签到,获得积分10
7秒前
田甜甜完成签到 ,获得积分10
7秒前
CodeCraft应助梦醒时采纳,获得10
7秒前
8秒前
8秒前
lh完成签到,获得积分10
8秒前
科研八戒完成签到,获得积分10
8秒前
8秒前
马成双发布了新的文献求助50
8秒前
8秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Binary Alloy Phase Diagrams, 2nd Edition 8000
A Practical Introduction to Regression Discontinuity Designs 2000
Comprehensive Methanol Science Production, Applications, and Emerging Technologies 2000
Building Quantum Computers 800
Translanguaging in Action in English-Medium Classrooms: A Resource Book for Teachers 700
二氧化碳加氢催化剂——结构设计与反应机制研究 660
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5659029
求助须知:如何正确求助?哪些是违规求助? 4825538
关于积分的说明 15084770
捐赠科研通 4817717
什么是DOI,文献DOI怎么找? 2578307
邀请新用户注册赠送积分活动 1532998
关于科研通互助平台的介绍 1491715