聚类分析
计算机科学
算法
树冠聚类算法
数据流聚类
CURE数据聚类算法
拉默-道格拉斯-派克算法
数据压缩
算法设计
数据点
模糊聚类
人工智能
作者
Tapas Kanungo,David M. Mount,Nathan S. Netanyahu,Christine Piatko,Ruth Silverman,Angela Y. Wu
标识
DOI:10.1109/tpami.2002.1017616
摘要
In k-means clustering, we are given a set of n data points in d-dimensional space R/sup d/ and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
科研通智能强力驱动
Strongly Powered by AbleSci AI