聚类分析
CURE数据聚类算法
相关聚类
数据挖掘
计算机科学
树冠聚类算法
模糊聚类
数据流聚类
模式识别(心理学)
单连锁聚类
高维数据聚类
共识聚类
人工智能
确定数据集中的群集数
标识
DOI:10.1016/j.asoc.2018.07.026
摘要
As an unsupervised pattern classification method, clustering partitions the input datasets into groups or clusters. It plays an important role in identifying the natural structure of the target datasets. Now, it has been widely used in data mining, pattern recognition, image processing and so on. However, due to different settings of the parameters and random selection of initial centers, traditional clustering algorithms may produce different clustering partitions for a single dataset. Clustering validity index (CVI) is an important method for evaluating the effect of clustering results generated by clustering algorithms. However, many of the existing CVIs suffer from complex computation, low time efficiency and narrow range of applications. In order to make clustering algorithms more stable, traditional K-means is firstly improved by the density parameters based initial center selection method other than randomly selecting initial centers. Then, in order to enlarge the application range of clustering and better evaluate the clustering partition results, a new variance based clustering validity index (VCVI) from the point of view of spatial distribution of datasets is designed. Finally, a new partitional clustering algorithm integrated with the improved K-means algorithm and the newly introduced VCVI is designed to optimize and determine the optimal clustering number (Kopt) for a wide range of datasets. Furthermore, the commonly used empirical rule Kmax⩽n is reasonably explained by the newly designed VCVI. The new algorithm integrated with VCVI is compared with traditional algorithms integrated with five commonly used CVIs. The experimental results show that our new clustering method is more accurate and stable while consuming relatively lower running time.
科研通智能强力驱动
Strongly Powered by AbleSci AI