聚类分析
计算机科学
类型(生物学)
数学
数据挖掘
人工智能
地质学
古生物学
作者
Xi Xiao,Hailong Ma,Guojun Gan,Qing Li,Bin Zhang,Shu‐Tao Xia
标识
DOI:10.1109/tnnls.2024.3392211
摘要
Data clustering is a fundamental machine learning task that seeks to categorize a dataset into homogeneous groups. However, real data usually contain noise, which poses significant challenges to clustering algorithms. In this article, motivated by how the k -means algorithm is derived from a Gaussian mixture model (GMM), we propose a robust k -means-type algorithm, named k -means-type clustering based on t -distribution (KMTD), by assuming that the data points are drawn from a special multivariate t -mixture model (TMM). Compared to the Gaussian distribution, the t -distribution has a fatter tail. The proposed algorithm is more robust to noise. Like the k -means algorithm, the proposed algorithm is simpler than those based on a full TMM. Both synthetic and actual data are used to illustrate the proposed algorithm's performance and efficiency. The experimental results demonstrated that the proposed algorithm operates more quickly than other sophisticated algorithms and, in most cases, achieves higher accuracy than the other algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI