范畴变量
聚类分析
数据挖掘
稳健性(进化)
计算机科学
序数数据
数学
人工智能
算法
星团(航天器)
模式识别(心理学)
统计
基因
程序设计语言
生物化学
化学
标识
DOI:10.1109/dsit55514.2022.9943828
摘要
For clustering analysis on categorical data, the distance measurement between two objects often plays a very important role. However, most of the existing categorical distance metrics do not distinguish between nominal attributes and ordinal attributes. That is, these metrics do not explore the information contained in ordinal values, and ignore the order relationship among them. Therefore, this paper proposes a novel clustering algorithm, which uses a united framework to measure the distance between nominal attributes and ordinal attributes while distinguishing the different characteristics between them. The basic idea of the proposed method is that, the attribute value pairs with larger co-occurrence probability in the same cluster may have smaller distances. Therefore, the distances between different categories are dynamically evaluated based on the current cluster structure of the data samples. Subsequently, the distances and cluster relationship are alternately learned until convergence. Experimental results show that the proposed algorithm has better robustness and performance than the existing counterparts on different kinds of categorical data sets.
科研通智能强力驱动
Strongly Powered by AbleSci AI