聚类分析
自编码
计算机科学
人工智能
特征(语言学)
相关聚类
CURE数据聚类算法
数据挖掘
数据流聚类
高维数据聚类
模式识别(心理学)
树冠聚类算法
机器学习
人工神经网络
语言学
哲学
作者
Xifeng Guo,Long Gao,Xinwang Liu,Jianping Yin
标识
DOI:10.24963/ijcai.2017/243
摘要
Deep clustering learns deep feature representations that favor clustering task using neural networks. Some pioneering work proposes to simultaneously learn embedded features and perform clustering by explicitly defining a clustering oriented loss. Though promising performance has been demonstrated in various applications, we observe that a vital ingredient has been overlooked by these work that the defined clustering loss may corrupt feature space, which leads to non-representative meaningless features and this in turn hurts clustering performance. To address this issue, in this paper, we propose the Improved Deep Embedded Clustering (IDEC) algorithm to take care of data structure preservation. Specifically, we manipulate feature space to scatter data points using a clustering loss as guidance. To constrain the manipulation and maintain the local structure of data generating distribution, an under-complete autoencoder is applied. By integrating the clustering loss and autoencoder's reconstruction loss, IDEC can jointly optimize cluster labels assignment and learn features that are suitable for clustering with local structure preservation. The resultant optimization problem can be effectively solved by mini-batch stochastic gradient descent and backpropagation. Experiments on image and text datasets empirically validate the importance of local structure preservation and the effectiveness of our algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI