范畴变量
马氏距离
聚类分析
数据挖掘
数学
Kullback-Leibler散度
熵(时间箭头)
分歧(语言学)
计算机科学
模式识别(心理学)
算法
人工智能
统计
量子力学
物理
语言学
哲学
作者
Elahe Mousavi,Mohammadreza Sehhati
标识
DOI:10.1016/j.patcog.2023.109353
摘要
Distance calculation is straightforward when working with pure categorical or pure numerical data sets. Defining a unified distance to improve the clustering performance for a mixed data set composed of nominal, ordinal, and numerical attributes is very challenging due to the attributes' different natures. In this study, we proposed a new measure of distance for a mixed-type data set that regards inter-attribute information and intra-attribute information depending on the type of attributes. In this regard, entropy and Jensen–Shannon divergence concepts were used to exploit the inter-attribute information of categorical-categorical and categorical-numerical attributes, respectively. Also, a modified version of Mahalanobis distance was proposed to consider the intra- and inter-attribute information of numerical attributes. We also introduced a unified framework based on mutual information to control attributes' contribution to distance measurement. The proposed distance in conjunction with spectral clustering was extensively evaluated concerning various categorical, numerical, and mixed-type benchmark data sets, and the results demonstrated the efficacy of the proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI