嵌入
计算机科学
聚类分析
余弦相似度
图嵌入
文档聚类
图形
节点(物理)
模式识别(心理学)
人工智能
数据挖掘
理论计算机科学
结构工程
工程类
作者
Sungwon Jung,Sangmin Ka
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:10: 130089-130096
标识
DOI:10.1109/access.2022.3228548
摘要
Document embedding methods for clustering using deep neural networks (DNNs) have been proposed recently. However, the existing DNN-based document embedding methods for clustering have a problem of either generating document embeddings dependent on a given number of document clusters or generating document embeddings that do not take into account the characteristic of high similarity between documents belonging to the same document cluster. In this paper, we propose a new document embedding method for clustering by using a graph autoencoder (GAE). To this end, we construct an undirected and weighted sparse graph from a set of documents wherein each document is represented by a node, and all the weighted edges created in the graph have high cosine similarities between the two end nodes. We then apply the proposed GAE to the graph to compute node embedding vectors. Each node embedding vector in the graph is used as a document embedding vector. This paper presents in-depth experimental analyses of the proposed method. Experimental results on various real document data sets demonstrate that the proposed approach affords the significant performance improvement over the existing document embedding methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI