融合
人工智能
计算机视觉
计算机科学
目标检测
对象(语法)
传感器融合
模式识别(心理学)
突出
图像融合
图像(数学)
哲学
语言学
作者
Liuxin Bao,Xiaofei Zhou,Bolun Zheng,Runmin Cong,Haibing Yin,Jiyong Zhang,Chenggang Yan
标识
DOI:10.1109/tip.2025.3527372
摘要
Visible-depth-thermal (VDT) salient object detection (SOD) aims to highlight the most visually attractive object by utilizing the triple-modal cues. However, existing models don't give sufficient exploration of the multi-modal correlations and differentiation, which leads to unsatisfactory detection performance. In this paper, we propose an interaction, fusion, and enhancement network (IFENet) to conduct the VDT SOD task, which contains three key steps including the multi-modal interaction, the multi-modal fusion, and the spatial enhancement. Specifically, embarking on the Transformer backbone, our IFENet can acquire multi-scale multi-modal features. Firstly, the inter-modal and intra-modal graph-based interaction (IIGI) module is deployed to explore inter-modal channel correlation and intra-modal long-term spatial dependency. Secondly, the gated attention-based fusion (GAF) module is employed to purify and aggregate the triple-modal features, where multi-modal features are filtered along spatial, channel, and modality dimensions, respectively. Lastly, the frequency split-based enhancement (FSE) module separates the fused feature into high-frequency and low-frequency components to enhance spatial information (i.e., boundary details and object location) of the salient object. Extensive experiments are performed on VDT-2048 dataset, and the results show that our saliency model consistently outperforms 13 state-of-the-art models. Our code and results are available at https://github.com/Lx-Bao/IFENet.
科研通智能强力驱动
Strongly Powered by AbleSci AI