聚类分析
计算机科学
层次聚类
算法
单连锁聚类
星团(航天器)
理论(学习稳定性)
相似性(几何)
树冠聚类算法
集合(抽象数据类型)
k-中心点
点(几何)
CURE数据聚类算法
中心(范畴论)
k-中位数聚类
相关聚类
完整的链接聚类
数据挖掘
数学
人工智能
机器学习
图像(数学)
结晶学
化学
程序设计语言
几何学
作者
Shen-yi QIAN,Huihui Liu,Dai-yi LI
出处
期刊:DEStech Transactions on Computer Science and Engineering
[DEStech Publications]
日期:2018-06-27
卷期号: (pcmm)
被引量:1
标识
DOI:10.12783/dtcse/pcmm2018/23653
摘要
K-means is a commonly used text clustering algorithm, the biggest advantage of the proposed algorithm is simple and fast, but due to the random selection of the initial cluster center point, the K-means algorithm is easy to fall into the local optimal algorithm and instability of the clustering results and the number of iterations. To solve this problem, this paper selected the initial cluster centers using hierarchical agglomerative clustering algorithm, to ensure the high quality of the center point; using cosine similarity to measure the distance between the text; reconstructed calculation formula of cluster center and the objective function of clustering quality. The experimental results show that the improved K-means algorithm has a relatively high accuracy and stability with the Sogou Chinese text corpus as the data set. Introduction
科研通智能强力驱动
Strongly Powered by AbleSci AI