特征选择
计算机科学
过度拟合
人工智能
模式识别(心理学)
特征(语言学)
降维
特征提取
分类
维数之咒
k-最近邻算法
数据挖掘
机器学习
算法
人工神经网络
哲学
语言学
作者
Jia Liu,Dong Li,Wangweiyi Shan,Shulin Liu
标识
DOI:10.1016/j.asoc.2023.111018
摘要
Directly applying high-dimensional data to machine learning leads to dimensionality disasters and may induce model overfitting. Feature selection can effectively reduce feature size. However, a single feature selection algorithm has instability and poor generalization ability problems. The ensemble feature selection algorithm is complex to find a suitable feature subset aggregation strategy. To solve these two problems, we propose a feature selection method based on multiple feature subsets extraction and result fusion (FSM). Generate multiple feature subsets to improve stability. This method uses mutual information to mine the relationship between features and categories. Fast non-dominated sorting combines this correlation to distribute similar features in the same layer. A layer optimization algorithm is proposed to combine the layered features to generate multiple different feature subsets. To evaluate the excellence of feature subsets, FSM uses precision, recall, and F-Score comprehensively to assess and remove ineffective feature subsets. The idea of fusion is put on the output of the results. Multiple superior feature subsets train various classifiers. The results of numerous classifiers are fused as the final output according to the voting method, simplifying the ensemble feature selection method's aggregation process. Experiments on 20 well-known datasets show that FSM can effectively reduce the data dimension and improve the classification performance compared with the original datasets. FSM performs well in classification performance and efficiency compared with other dimensionality reduction algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI