支持向量机
人工智能
嗜热菌
计算机科学
模式识别(心理学)
刀切重采样
机器学习
主成分分析
数据挖掘
数学
化学
统计
生物化学
酶
估计员
作者
Xianfang Wang,Peng Gao,Yifeng Liu,Hongfei Li,Lu Fan
出处
期刊:Current Bioinformatics
[Bentham Science]
日期:2020-02-07
卷期号:15 (5): 493-502
被引量:95
标识
DOI:10.2174/1574893615666200207094357
摘要
Background: Thermophilic proteins can maintain good activity under high temperature, therefore, it is important to study thermophilic proteins for the thermal stability of proteins. Objective: In order to solve the problem of low precision and low efficiency in predicting thermophilic proteins, a prediction method based on feature fusion and machine learning was proposed in this paper. Methods: For the selected thermophilic data sets, firstly, the thermophilic protein sequence was characterized based on feature fusion by the combination of g-gap dipeptide, entropy density and autocorrelation coefficient. Then, Kernel Principal Component Analysis (KPCA) was used to reduce the dimension of the expressed protein sequence features in order to reduce the training time and improve efficiency. Finally, the classification model was designed by using the classification algorithm. Results: A variety of classification algorithms was used to train and test on the selected thermophilic dataset. By comparison, the accuracy of the Support Vector Machine (SVM) under the jackknife method was over 92%. The combination of other evaluation indicators also proved that the SVM performance was the best. Conclusion: Because of choosing an effectively feature representation method and a robust classifier, the proposed method is suitable for predicting thermophilic proteins and is superior to most reported methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI