计算机科学
人工智能
利用
增采样
变压器
目标检测
动态范围
编码
突出
高动态范围
模式识别(心理学)
计算机视觉
图像(数学)
工程类
电气工程
生物化学
化学
计算机安全
电压
基因
作者
Qingping Zheng,Ling Zheng,Jiankang Deng,Ying Li,Changjing Shang,Qiang Shen
标识
DOI:10.1016/j.knosys.2023.111075
摘要
Global context and global contrast are crucial clues for Salient Object Detection (SOD) in images. Most advanced SOD methods exploit CNN-based architectures, achieving impressive results. However, these methods have intrinsic limitations in capturing long-range global information since a CNN extracts feature in local sliding windows. In contrast, transformers exploit a self-attention mechanism to extract features, gaining a powerful capability of learning global cues. Nonetheless, a pure transformer-based network consumes a large computational overhead and easily suffers from attention collapse, as it goes deeper. To address this issue, in this paper, we propose a Transformer-based Hierarchical Dynamic Decoder (T-HDDNet) for image salient object detection. Specifically, our T-HDDNet employs the transformer to encode each image patch into multi-level and multi-resolution features based on the long-range dependencies among pixels. To obtain an accurate saliency map of high resolution, we develop a dynamic dual upsampling mechanism to enlarge feature spatial size in a data-driven manner, together with a dynamic feature fusion unit. Ultimately, the hierarchical dynamic decoders built on the basis of these two units are used to attain the final saliency progressively. Extensive experimental results show that the proposed method achieves the best performance on all benchmarks, in comparison with state-of-the-art technologies.
科研通智能强力驱动
Strongly Powered by AbleSci AI