计算机科学
人工智能
RGB颜色模型
增采样
像素
计算机视觉
模式识别(心理学)
突出
图像(数学)
作者
Mingfeng Jiang,Jianhua Ma,Jiatong Chen,Yaming Wang,Xian Fang
标识
DOI:10.1016/j.knosys.2024.111597
摘要
Multimodal salient object detection (SOD) combines different modal images to generate the most visually appealing saliency map. When fusing multimodal and multiscale features, maintaining the integrity and fine granularity of the target is critical for improving the performance of multimodal SOD. The fine-grained information differences between the modalities and the size of the features in the transformer prevent most existing studies from guaranteeing both granularities. Therefore, we propose a patch-to-pixel attention-aware transformer network (PATNet) to overcome these problems, whereby the integrity and fine-grained details of the saliency map are preserved by employing a decision-transformation strategy to map global patches onto local pixels. Specifically, PATNet consists of the shared attention fusion module (SAFM), adjacent modeling fusion module (AMFM), and fine-grained mapping module (FMM). SAFM enhances the consistency between multimodal features through a shared attention matrix and an identical convolutional feed-forward network. Meanwhile, AMFM enhances low-resolution features by modeling neighboring features to avoid the aliasing effect of upsampling. In the output stage, FMM is responsible for mapping the feature maps represented by patches onto pixels and restoring the salient object details. Numerous experimental results demonstrate that PATNet outperforms 24 state-of-the-art methods on six RGB-D and three RGB-T datasets. The source code is publicly available at https://github.com/LitterMa-820/PATNet.
科研通智能强力驱动
Strongly Powered by AbleSci AI