聚类分析
缩放比例
一致性(知识库)
数学
算法
星团(航天器)
比例(比率)
模式识别(心理学)
计算机科学
歧管(流体力学)
人工智能
统计
物理
几何学
机械工程
工程类
量子力学
程序设计语言
作者
Xinmin Tao,Wenjie Guo,Chao Ren,Qing Li,Qing He,Rui Liu,Junrong Zou
标识
DOI:10.1016/j.ins.2021.08.036
摘要
A novel density-based clustering algorithm, called Density Peak Clustering (DPC), has recently received great attention due to its efficiency in clustering performance and simplicity in implementation. However, empirical studies have demonstrated that the commonly used distance measures in DPC cannot simultaneously consider global and local consistency, which can cause the estimated local densities based on it incapable of capturing the ground-truth data structure and thus produce poor clustering results, especially when the clusters existing in datasets exhibit multi-density manifold structures characteristics with different sizes. In order to address those limitations, we propose a novel density peak clustering algorithm using global and local consistency adjustable manifold distance in this paper. In the proposed algorithm, a novel manifold distance with exponential term and scaling factor is introduced to estimate local densities of all data points. By modifying its exponential term and scaling factor, we can flexibly adjust the ratio of the distance between the data within the same manifold to the distance between the data across different manifolds. This flexible adjustment is beneficial to the estimated local densities more accurately reflecting the global and local consistency of data structures. In addition, to effectively deal with clusters with different densities and sizes, a compensation strategy for distance from nearest point with larger density, called local-scale tuning distance, is developed for our proposed approach. By the developed local-scale tuning distance, underlying cluster centers of clusters with different densities and sizes, especially the clusters with low densities or small sizes can remarkably stand out from the decision graph so that the proposed method can accurately identify the number of underlying clusters in the decision graph and thus obtain satisfactory clustering results. In the experimental part, the effect of the scaling factor on the performance of the proposed technique is discussed and some suggestions about the determination of the parameters are given. Theoretical analysis and experimental results on several synthetic datasets and read-world datasets demonstrate that the proposed approach is superior to other existing clustering techniques in terms of three evaluation metrics with statistical significance.
科研通智能强力驱动
Strongly Powered by AbleSci AI