计算机科学
编码器
人工智能
稳健性(进化)
变压器
像素
模式识别(心理学)
计算机视觉
机器学习
生物化学
量子力学
基因
操作系统
物理
电压
化学
作者
Meng Meng,Tianzhu Zhang,Zhe Zhang,Yongdong Zhang,Feng Wu
标识
DOI:10.1109/tpami.2022.3230902
摘要
Weakly supervised object localization (WSOL) aims to predict both object locations and categories with only image-level class labels. However, most existing methods rely on class-specific image regions for localization, resulting in incomplete object localization. To alleviate this problem, we propose a novel end-to-end task-aware framework with a transformer encoder-decoder architecture (TAFormer) to learn class-agnostic foreground maps, including a representation encoder, a localization decoder, and a classification decoder. The proposed TAFormer enjoys several merits. First, the designed three modules can effectively perform class-agnostic localization and classification in a task-aware manner, achieving remarkable performance for both tasks. Second, an optimal transport algorithm is proposed to provide pixel-level pseudo labels to online refine foreground maps. To the best of our knowledge, this is the first work by exploring a task-aware framework with a transformer architecture and an optimal transport algorithm to achieve accurate object localization for WSOL. Extensive experiments with four backbones on two standard benchmarks demonstrate that our TAFormer achieves favorable performance against state-of-the-art methods. Furthermore, we show that the proposed TAFormer provides higher robustness against adversarial attacks and noisy labels.
科研通智能强力驱动
Strongly Powered by AbleSci AI