度量(数据仓库)
欧几里德距离
聚类分析
成对比较
相似性度量
公制(单位)
相似性(几何)
数学
模式识别(心理学)
k-最近邻算法
距离测量
人工智能
计算机科学
数据挖掘
图像(数学)
经济
运营管理
作者
Zafaryab Rasool,Sunil Aryal,Mohamed Reda Bouadjenek,Richard Dazeley
标识
DOI:10.1016/j.patcog.2022.109287
摘要
Density Peak Clustering (DPC) is a popular state-of-the-art clustering algorithm, which requires pairwise (dis)similarity of data objects to detect arbitrary shaped clusters. While it is shown to perform well for many applications, DPC remains: (i) not robust for datasets with clusters having different densities, and (ii) sensitive to the change in the units/scales used to represent data. These drawbacks are mainly due to the use of the data-independent similarity measure based on the Euclidean distance. In this paper, we address these issues by proposing an effective data-dependent similarity measure based on Probability Mass, which we call MP-Similarity, and by incorporating it in DPC to create MP-DPC, a data-dependent variant of DPC. We evaluate and compare MP-DPC against diverse baselines using several clustering metrics and datasets. Our experiments demonstrate that: (a) MP-DPC produces better clustering results than DPC using the Euclidean distance and existing data-dependent similarity measures; (b) MP-Similarity coupled with Shared-Nearest-Neighbor-based density metric in DPC further enhances the quality of clustering results; and (c) unlike DPC with existing data-independent and data-dependent similarity measures, MP-DPC is robust to the change in the units/scales used to represent data. Our findings suggest that MP-Similarity provides a more viable solution for DPC in datasets with unknown distribution or units/scales of features, which is often the case in many real-world applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI