计算机科学
地点
变压器
目标检测
解码方法
人工智能
安全性令牌
收缩率
模式识别(心理学)
计算机视觉
机器学习
计算机网络
算法
工程类
电压
电气工程
哲学
语言学
作者
Zhou Huang,Hang Dai,Tian-Zhu Xiang,Shuo Wang,Huaixin Chen,Jie Qin,Huan Xiong
标识
DOI:10.1109/cvpr52729.2023.00538
摘要
Vision transformers have recently shown strong global context modeling capabilities in camouflaged object detection. However, they suffer from two major limitations: less effective locality modeling and insufficient feature aggregation in decoders, which are not conducive to camou-flaged object detection that explores subtle cues from indistinguishable backgrounds. To address these issues, in this paper, we propose a novel transformer-based Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode locality-enhanced neighboring transformer features through progressive shrinking for camou-flaged object detection. Specifically, we propose a non-local token enhancement module (NL-TEM) that employs the non-local mechanism to interact neighboring tokens and explore graph-based high-order relations within tokens to enhance local representations of transformers. Moreover, we design a feature shrinkage decoder (FSD) with adjacent interaction modules (AIM), which progressively aggregates adjacent transformer features through a layer-by-layer shrinkage pyramid to accumulate imperceptible but effective cues as much as possible for object information decoding. Extensive quantitative and qualitative experiments demonstrate that the proposed model significantly outperforms the existing 24 competitors on three challenging COD benchmark datasets under six widely-used evaluation metrics. Our code is publicly available at https://github.com/ZhouHuang23/FSPNet.
科研通智能强力驱动
Strongly Powered by AbleSci AI