特征选择
计算机科学
过度拟合
可执行文件
分类器(UML)
随机森林
数据挖掘
特征(语言学)
选择(遗传算法)
特征提取
模式识别(心理学)
人工智能
算法
机器学习
操作系统
哲学
人工神经网络
语言学
作者
Fanny Dao,Hao Lv,Zhao‐Yue Zhang,Hao Lin
出处
期刊:Current Bioinformatics
[Bentham Science]
日期:2021-10-08
卷期号:17 (3): 238-244
被引量:17
标识
DOI:10.2174/1574893616666211007102747
摘要
Background: Dimension disaster is often associated with feature extraction. The extracted features may contain more redundant feature information, which leads to the limitation of computing ability and overfitting problems. Objective: Feature selection is an important strategy to overcome the problems from dimension disaster. In most machine learning tasks, features determine the upper limit of the model performance. Therefore, more and more feature selection methods should be developed to optimize redundant features. Methods: In this paper, we introduce a new technique to optimize sequence features based on the binomial distribution (BD). Firstly, the principle of the binomial distribution algorithm is introduced in detail. Then, the proposed algorithm is compared with other commonly used feature selection methods on three different types of datasets by using a Random Forest classifier with the same parameters. Results: The results confirm that BD has a promising improvement in feature selection and classification accuracy. Conclusion: Finally, we provide the source code and executable program package (http: //lingroup. cn/server/BDselect/), by which users can easily perform our algorithm in their researches.
科研通智能强力驱动
Strongly Powered by AbleSci AI