特征选择
模式识别(心理学)
特征(语言学)
余弦相似度
相似性(几何)
系数矩阵
计算机科学
散列函数
离散化
数学
数据预处理
人工智能
算法
数据挖掘
数学分析
哲学
物理
图像(数学)
量子力学
特征向量
语言学
计算机安全
作者
Gaoteng Yuan,Yi Zhai,Jiansong Tang,Xiaofeng Zhou
出处
期刊:Neurocomputing
[Elsevier]
日期:2023-10-01
卷期号:552: 126564-126564
被引量:2
标识
DOI:10.1016/j.neucom.2023.126564
摘要
Feature selection (FS) based on mutual information (MI) metrics needs to discretize the data in preprocessing, which is a convenient way to identify correlation between features. However, information loss often occurs in data discretization. In order to solve this information loss problem, this paper proposes a FS algorithm based on cosine similarity coefficient and information measurement criterion (CSCIM_FS). First, the MI between features and tags is calculated, and features are sorted out according to the MI calculated. Then, a feature matrix is constructed to transform the one-dimensional feature sequence into a two-dimensional square matrix. Next, cosine transform is adopted to obtain the high-frequency components of the feature matrix, and sampling is conducted to derive the hash fingerprint of the feature matrix. After that, the similarity between every two features is calculated on the basis of the hash fingerprints of different features. Finally, the feature weight is calculated according to tags, the MI and similarity between features, and a key feature subset is obtained and used to conduct feature selection from the data. The experimental results on several UCI public datasets show that CSCIM_FS algorithm selected a feature subset with high accuracy, and that this algorithm performs better than MIM, CMIM, mRMR and other algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI