RGB颜色模型
计算机科学
人工智能
情态动词
特征(语言学)
水准点(测量)
模式识别(心理学)
模态(人机交互)
保险丝(电气)
计算机视觉
工程类
大地测量学
哲学
化学
电气工程
高分子化学
地理
语言学
作者
Wei Gao,Guibiao Liao,Siwei Ma,Ge Li,Yongsheng Liang,Weisi Lin
标识
DOI:10.1109/tcsvt.2021.3082939
摘要
The use of complementary information, namely
\ndepth or thermal information, has shown its benefits to salient object detection (SOD) during recent years. However, the RGB-D or
\nRGB-T SOD problems are currently only solved independently,
\nand most of them directly extract and fuse raw features from
\nbackbones. Such methods can be easily restricted by low-quality
\nmodality data and redundant cross-modal features. In this work,
\na unified end-to-end framework is designed to simultaneously
\nanalyze RGB-D and RGB-T SOD tasks. Specifically, to effectively
\ntackle multi-modal features, we propose a novel multi-stage and
\nmulti-scale fusion network (MMNet), which consists of a crossmodal multi-stage fusion module (CMFM) and a bi-directional
\nmulti-scale decoder (BMD). Similar to the visual color stage
\ndoctrine in the human visual system (HVS), the proposed CMFM
\naims to explore important feature representations in feature
\nresponse stage, and integrate them into cross-modal features
\nin adversarial combination stage. Moreover, the proposed BMD
\nlearns the combination of multi-level cross-modal fused features
\nto capture both local and global information of salient objects, and can further boost the multi-modal SOD performance.
\nThe proposed unified cross-modality feature analysis framework
\nbased on two-stage and multi-scale information fusion can be
\nused for diverse multi-modal SOD tasks. Comprehensive experiments (∼92K image-pairs) demonstrate that the proposed method
\nconsistently outperforms the other 21 state-of-the-art methods
\non nine benchmark datasets. This validates that our proposed
\nmethod can work well on diverse multi-modal SOD tasks with
\ngood generalization and robustness, and provides a good multimodal SOD benchmark.
科研通智能强力驱动
Strongly Powered by AbleSci AI