计算机科学
RGB颜色模型
人工智能
编码器
模式识别(心理学)
骨干网
变压器
解码方法
深度学习
计算机视觉
算法
计算机网络
量子力学
操作系统
物理
电压
作者
Chao Zeng,Sam Kwong,Horace H. S. Ip
出处
期刊:Neurocomputing
[Elsevier]
日期:2023-09-17
卷期号:559: 126779-126779
被引量:15
标识
DOI:10.1016/j.neucom.2023.126779
摘要
Depth information for RGB-D Salient Object Detection(SOD) is important and conventional deep models are usually relied on the CNN feature extractors. The long-range contextual dependencies, dense modeling on the saliency decoder, and multi-task learning assistance are usually ignored. In this work, we propose a Dual Swin-Transformer-based Mutual Interactive Network (DTMINet), aiming to learn contextualized, dense, and edge-aware features for RGB-D SOD. We adopt the Swin-Transformer as the visual backbone to extract contextualized features. A self-attention-based Cross-Modality Interaction module is proposed to strengthen the visual backbone for cross-modal interaction. In addition, a Gated Modality Attention module is designed for cross-modal fusion. At different decoding stages, enhanced with dense connections and progressively merge the multi-level encoding features with the proposed Dense Saliency Decoder. Considering the depth quality issue, a Skip Convolution module is introduced to provide guidance to the RGB modality for the saliency prediction. In addition, we add the edge prediction to the saliency predictor to regularize the learning process. Comprehensive experiments on five standard RGB-D SOD benchmark datasets over four evaluation metrics demonstrate the superiority of the proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI