范畴变量
聚类分析
数学
加权
序数数据
公制(单位)
人工智能
数据挖掘
透视图(图形)
模式识别(心理学)
计算机科学
统计
运营管理
医学
放射科
经济
作者
Yiqun Zhang,Yiu‐ming Cheung
标识
DOI:10.1109/tpami.2021.3056510
摘要
The success of categorical data clustering generally much relies on the distance metric that measures the dissimilarity degree between two objects. However, most of the existing clustering methods treat the two categorical subtypes, i.e., nominal and ordinal attributes, in the same way when calculating the dissimilarity without considering the relative order information of the ordinal values. Moreover, there would exist interdependence among the nominal and ordinal attributes, which is worth exploring for indicating the dissimilarity. This paper will therefore study the intrinsic difference and connection of nominal and ordinal attribute values from a perspective akin to the graph. Accordingly, we propose a novel distance metric to measure the intra-attribute distances of nominal and ordinal attributes in a unified way, meanwhile preserving the order relationship among ordinal values. Subsequently, we propose a new clustering algorithm to make the learning of intra-attribute distance weights and partitions of data objects into a single learning paradigm rather than two separate steps, whereby circumventing a suboptimal solution. Experiments show the efficacy of the proposed algorithm in comparison with the existing counterparts.
科研通智能强力驱动
Strongly Powered by AbleSci AI