Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

RGB颜色模型计算机科学人工智能情态动词特征（语言学）水准点（测量）模式识别（心理学）模态（人机交互）保险丝（电气）计算机视觉工程类大地测量学哲学化学电气工程高分子化学地理语言学

作者

Wei Gao,Guibiao Liao,Siwei Ma,Ge Li,Yongsheng Liang,Weisi Lin

出处

期刊：IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
日期：2021-05-24 卷期号：32 (4): 2091-2106 被引量：134

链接

handle.net edu.sgdoi.org

标识

DOI：10.1109/tcsvt.2021.3082939

摘要

The use of complementary information, namely depth or thermal information, has shown its benefits to salient object detection (SOD) during recent years. However, the RGB-D or RGB-T SOD problems are currently only solved independently, and most of them directly extract and fuse raw features from backbones. Such methods can be easily restricted by low-quality modality data and redundant cross-modal features. In this work, a unified end-to-end framework is designed to simultaneously analyze RGB-D and RGB-T SOD tasks. Specifically, to effectively tackle multi-modal features, we propose a novel multi-stage and multi-scale fusion network (MMNet), which consists of a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the visual color stage doctrine in the human visual system (HVS), the proposed CMFM aims to explore important feature representations in feature response stage, and integrate them into cross-modal features in adversarial combination stage. Moreover, the proposed BMD learns the combination of multi-level cross-modal fused features to capture both local and global information of salient objects, and can further boost the multi-modal SOD performance. The proposed unified cross-modality feature analysis framework based on two-stage and multi-scale information fusion can be used for diverse multi-modal SOD tasks. Comprehensive experiments (

$\sim 92\text{K}$

image-pairs) demonstrate that the proposed method consistently outperforms the other 21 state-of-the-art methods on nine benchmark datasets. This validates that our proposed method can work well on diverse multi-modal SOD tasks with good generalization and robustness, and provides a good multi-modal SOD benchmark.

求助该文献

最长约 10秒，即可获得该文献文件

Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

今日热心研友