聚类分析
相关聚类
CURE数据聚类算法
范畴变量
约束聚类
计算机科学
数据挖掘
数据流聚类
树冠聚类算法
模糊聚类
数学
人工智能
机器学习
作者
Aristides Gionis,Heikki Mannila,Panayiotis Tsaparas
出处
期刊:ACM Transactions on Knowledge Discovery From Data
[Association for Computing Machinery]
日期:2007-03-01
卷期号:1 (1): 4-4
被引量:756
标识
DOI:10.1145/1217299.1217303
摘要
We consider the following problem: given a set of clusterings, find a single clustering that agrees as much as possible with the input clusterings. This problem, clustering aggregation , appears naturally in various contexts. For example, clustering categorical data is an instance of the clustering aggregation problem; each categorical attribute can be viewed as a clustering of the input rows where rows are grouped together if they take the same value on that attribute. Clustering aggregation can also be used as a metaclustering method to improve the robustness of clustering by combining the output of multiple algorithms. Furthermore, the problem formulation does not require a priori information about the number of clusters; it is naturally determined by the optimization function. In this article, we give a formal statement of the clustering aggregation problem, and we propose a number of algorithms. Our algorithms make use of the connection between clustering aggregation and the problem of correlation clustering . Although the problems we consider are NP-hard, for several of our methods, we provide theoretical guarantees on the quality of the solutions. Our work provides the best deterministic approximation algorithm for the variation of the correlation clustering problem we consider. We also show how sampling can be used to scale the algorithms for large datasets. We give an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions.
科研通智能强力驱动
Strongly Powered by AbleSci AI