tf–国际设计公司
计算机科学
聚类分析
文档聚类
情报检索
预处理器
向量空间模型
相似性(几何)
图形
分类
光谱聚类
人工智能
数据挖掘
自然语言处理
理论计算机科学
期限(时间)
图像(数学)
物理
量子力学
作者
Rowaida Khalil Ibrahim,Subhi R. M. Zeebaree,Karwan Jacksi,Sarkar Hasan Ahmed,Shapol M. Mohammed,Rizgar R. Zebari,Ahmed Alkhayyat,Zryan Najat Rashid
标识
DOI:10.1109/iiceta54559.2022.9888613
摘要
The Internet’s continued growth has resulted in a significant rise in the amount of electronic text documents. Grouping these materials into meaningful collections has become crucial. The old approach of document compilation based on statistical characteristics and categorization relied on syntactic rather than semantic information. This article introduces a unique approach for classifying texts based on their semantic similarity. The graph-based approach is depended an efficient technique been utilized for clustering. This is performed by extracting document summaries called synopses from the Wikipedia and IMDB databases and grouping thus downloaded documents, then utilizing the NLTK dictionary to generate them by making some important preprocessing to make it more convenient to use. Following that, a vector space is modelled using TFIDF and converted to TFIDF matrix as numeric form, and clustering is accomplished using Spectral methods. The results are compared with previews work.
科研通智能强力驱动
Strongly Powered by AbleSci AI