范畴变量
聚类分析
模式识别(心理学)
数据挖掘
概率逻辑
人工智能
计算机科学
数学
核(代数)
机器学习
组合数学
作者
Lifei Chen,Shengrui Wang,Kaijun Wang,Jianping Zhu
标识
DOI:10.1016/j.patcog.2015.09.027
摘要
Categorical data clustering is an important subject in pattern recognition. Currently, subspace clustering of categorical data remains an open problem due to the difficulties in estimating attribute interestingness according to the statistics of categories in clusters. In this paper, a new algorithm is proposed for clustering categorical data with a novel soft feature-selection scheme, by which each categorical attribute is automatically assigned a weight that correlates with the smoothed dispersion of the categories in a cluster. In the proposed algorithm, dissimilarity between categorical data objects is measured using a probabilistic distance function, based on kernel density estimation for categorical attributes. We also make use of the probabilistic distances to define a cluster validity index for estimating the number of categorical clusters. The suitability of the proposal is demonstrated in an empirical study done with some widely used real-world data sets and synthetic data sets, and the results show its outstanding performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI