计算机科学
粒子群优化
聚类分析
文档聚类
元启发式
人工智能
数据挖掘
算法
作者
Ratnam Dodda,A. Suresh Babu
标识
DOI:10.1142/s0218213023500616
摘要
In the present digital era, vast amounts of data are generated by millions of Internet users in the form of unstructured text documents. The clustering and organizing of text documents play a crucial role in the applications of data analysis and market research. In this research manuscript, a new modified version of metaheuristic-based optimization technique is proposed with k-means for clustering the text documents. In the initial phase, the input data are acquired from the three-benchmark databases such as Reuters-21578, 20-Newsgroup and British Broadcasting Corporation (BBC)-sport. Further, the data denoising is accomplished by using the common techniques: stemming, lemmatization, tokenization, and stop word removal. In addition to this, the denoised data are transformed into feature vectors by utilizing Term Frequency (TF)-Inverse Document Frequency (IDF) technique. The computed feature vectors are given to the Modified Particle Swarm Optimization (MPSO) with k-means to group the closely related text documents by minimizing the similarity in different clusters. The experimental examination showed that the proposed MPSO with k-means model achieved accuracy of 0.85, 0.85 and 0.86 on the Reuters-21578, 20-Newsgroup and BBC-sport databases, which are superior to the comparative models.
科研通智能强力驱动
Strongly Powered by AbleSci AI