计算机科学
人工智能
串联(数学)
情态动词
特征(语言学)
模式识别(心理学)
突出
编码器
融合
计算机视觉
数学
哲学
组合数学
操作系统
化学
高分子化学
语言学
作者
Bin Wan,Xiaofei Zhou,Yaoqi Sun,Tingyu Wang,Chengtao Lv,Shuai Wang,Haibing Yin,Chenggang Yan
标识
DOI:10.1109/tmm.2023.3291823
摘要
This article discusses the limitations of single- and two-modal salient object detection (SOD) methods and the emergence of multi-modal SOD techniques that integrate Visible, Depth, or Thermal information. However, current multi-modal methods often rely on simple fusion techniques such as addition, multiplication and concatenation, to combine the different modalities, which is ineffective for challenging scenes, such as low illumination and background messy. To address this issue, we propose a novel multi-modal feature fusion network (MFFNet) for V-D-T salient object detection, where the two key points are the triple-modal deep fusion encoder and the progressive feature enhancement decoder. The MFFNet's triple-modal deep fusion (TDF) module is designed to integrate the features of the three modalities and explore their complementarity by utilizing mutual optimization during the encoding phase. In addition, the progressive feature enhancement decoder consists of the weighted context-enhanced feature (WCF) module, region optimization (RO) module and boundary perception (BP) module to produce region-aware and contour-aware features. After that, a multi-scale fusion (MF) module is proposed to integrate these features and generate high-quality saliency maps. We conduct extensive experiments on the VDT-2048 dataset, and our results show that the proposed MFFNet outperforms 12 state-of-the-art multi-modal methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI