聚类分析
计算机科学
星团(航天器)
无监督学习
数据挖掘
相关聚类
机器学习
概念聚类
人工智能
CURE数据聚类算法
程序设计语言
作者
Jing Zhang,Hong Tao,Chenping Hou
标识
DOI:10.1109/tkde.2023.3242306
摘要
Imbalanced clustering, where the number of samples varies in different clusters, has arisen from many real data mining applications. It has gained increasing attention. Nevertheless, due to its unsupervised nature, imbalanced clustering is more challenging than its supervised counterpart, i.e., imbalanced classification. Furthermore, existing imbalanced clustering methods are empirically designed and they often lack solid theoretical guarantees, e.g., the excess risk estimation. To solve these important but rarely studied problems, we first propose a novel $k$ -Means algorithm for imbalanced clustering problem with Adaptive Cluster Weight (MACW), together with its excess clustering risk bound analysis. Inspired by this theoretical result, we further propose an improved algorithm called Imbalanced Clustering with Theoretical Learning Bounds (ICTLB). It refines the weights and encourages the optimal trade-off among per-cluster weights by optimizing the excess clustering risk bound. A theoretically-principled justification of ICTLB is provided for verification. Comprehensive experiments on many imbalanced datasets verify the effectiveness of ICTLB in solving cluster imbalanced problems.
科研通智能强力驱动
Strongly Powered by AbleSci AI