聚类分析
计算机科学
数据流聚类
图形
可达性
相关聚类
符号
理论计算机科学
CURE数据聚类算法
数据挖掘
人工智能
数学
算术
作者
Jiarui Sun,Mingjing Du,Zhenkang Lew,Yongquan Dong
出处
期刊:IEEE Transactions on Fuzzy Systems
[Institute of Electrical and Electronics Engineers]
日期:2024-03-12
卷期号:32 (9): 4927-4939
被引量:2
标识
DOI:10.1109/tfuzz.2024.3369716
摘要
A bunch of stream clustering algorithms have been proposed recently to mine data streams generated at high speeds from hardware platforms and software applications. Density-based methods are widely used because they can handle outliers and capture clusters of arbitrary shapes. However, it is still hard to effectively identify multi-density clusters with ambiguous boundaries in a data stream. To address these limitations, this paper introduces a data stream clustering algorithm called TWStream, based on the three-way decision theory. It is a two-stage clustering algorithm based on density. In the online stage, an augmented $k$ nn graph is maintained incrementally to accelerate the update of the $k$ nn graph. In the offline stage, TWStream introduces the concept of boundary confidence to detect cluster boundaries efficiently and reveal potential cores of clusters. It integrates the skewness and sparsity of the data distribution, as well as the evolving trend of the stream.In the next step, a micro-cluster-based three-way clustering strategy is applied to reconstruct latent clusters. It improves the clustering quality of boundary-ambiguous clusters in a stream using a mutual reachability-based clustering approach and a three-way assignment approach. The proposed algorithm is compared with 9 competitors on 15 data streams. Experimental results show TWStream achieves competitive performance, verifying its effectiveness. The source code of the proposed TWStream can be available at https://github.com/Du-Team/TWStream .
科研通智能强力驱动
Strongly Powered by AbleSci AI