计算机科学
人工智能
分割
聚类分析
杠杆(统计)
模式识别(心理学)
像素
机器学习
作者
Li Shengqi,Qing Liu,Chaojun Zhang,Yixiong Liang
标识
DOI:10.1007/978-981-99-8462-6_7
摘要
Unsupervised semantic segmentation (USS) aims to identify semantically consistent regions and assign correct categories without annotations. Since the self-supervised pre-trained vision transformer (ViT) can provide pixel-level features containing rich class-aware information and object distinctions, it has recently been widely used as the backbone for unsupervised semantic segmentation. Although these methods achieve exceptional performance, they often rely on the parametric classifiers and therefore need the prior about the number of categories in advance. In this work, we investigate the process of clustering adaptively for the current mini-batch of images without having prior on the number of categories and propose Adaptive Cluster Assignment Module (ACAM) to replace parametric classifiers. Furthermore, we optimize ACAM to generate weights via the introduction of contrastive learning, which is used to re-weight features, thereby generating semantically consistent clusters. Additionally, we leverage image-text pre-trained models, CLIP, to assign specific labels to each mask obtained from clustering and pixel assignment. Our method achieves new state-of-the-art results in COCO-Stuff and Cityscapes datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI