互连性
亲密度
计算机科学
聚类分析
数据挖掘
合并(版本控制)
骨料(复合)
理论计算机科学
星团(航天器)
集合(抽象数据类型)
算法
人工智能
数学
情报检索
复合材料
数学分析
材料科学
程序设计语言
作者
George Karypis,Eui-Hong Han,Vineet Kumar
出处
期刊:IEEE Computer
[Institute of Electrical and Electronics Engineers]
日期:1999-08-01
卷期号:32 (8): 68-75
被引量:1823
摘要
Clustering is a discovery process in data mining. It groups a set of data in a way that maximizes the similarity within clusters and minimizes the similarity between two different clusters. Many advanced algorithms have difficulty dealing with highly variable clusters that do not follow a preconceived model. By basing its selections on both interconnectivity and closeness, the Chameleon algorithm yields accurate results for these highly variable clusters. Existing algorithms use a static model of the clusters and do not use information about the nature of individual clusters as they are merged. Furthermore, one set of schemes (the CURE algorithm and related schemes) ignores the information about the aggregate interconnectivity of items in two clusters. Another set of schemes (the Rock algorithm, group averaging method, and related schemes) ignores information about the closeness of two clusters as defined by the similarity of the closest items across two clusters. By considering either interconnectivity or closeness only, these algorithms can select and merge the wrong pair of clusters. Chameleon's key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters. Chameleon finds the clusters in the data set by using a two-phase algorithm. During the first phase, Chameleon uses a graph partitioning algorithm to cluster the data items into several relatively small subclusters. During the second phase, it uses an algorithm to find the genuine clusters by repeatedly combining these subclusters.
科研通智能强力驱动
Strongly Powered by AbleSci AI