聚类分析
计算机科学
公制(单位)
人工智能
数据挖掘
相关聚类
样品(材料)
模式识别(心理学)
成对比较
高维数据聚类
嵌入
CURE数据聚类算法
树冠聚类算法
单连锁聚类
数据点
维数之咒
运营管理
化学
色谱法
经济
作者
Xu Xiong,Zhang Chun,Chenggang Wang,Xiaoyan Zhang,Hua Meng
出处
期刊:Intelligent Automation and Soft Computing
[Computers, Materials and Continua (Tech Science Press)]
日期:2023-01-01
卷期号:37 (1): 815-831
标识
DOI:10.32604/iasc.2023.034656
摘要
Clustering analysis is one of the main concerns in data mining. A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other. Therefore, measuring the distance between sample points is crucial to the effectiveness of clustering. Filtering features by label information and measuring the distance between samples by these features is a common supervised learning method to reconstruct distance metric. However, in many application scenarios, it is very expensive to obtain a large number of labeled samples. In this paper, to solve the clustering problem in the few supervised sample and high data dimensionality scenarios, a novel semi-supervised clustering algorithm is proposed by designing an improved prototype network that attempts to reconstruct the distance metric in the sample space with a small amount of pairwise supervised information, such as Must-Link and Cannot-Link, and then cluster the data in the new metric space. The core idea is to make the similar ones closer and the dissimilar ones further away through embedding mapping. Extensive experiments on both real-world and synthetic datasets show the effectiveness of this algorithm. Average clustering metrics on various datasets improved by 8% compared to the comparison algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI