计算机科学
RGB颜色模型
人工智能
背景(考古学)
分割
情态动词
特征(语言学)
噪音(视频)
模式识别(心理学)
对偶(语法数字)
模态(人机交互)
计算机视觉
图像(数学)
文学类
高分子化学
艺术
古生物学
语言学
哲学
化学
生物
作者
Xiangyu Guo,Wei Ma,Fangfang Liang,Qing Mi
标识
DOI:10.1016/j.eswa.2024.124598
摘要
Complementarily fusing RGB and depth images while effectively suppressing task-irrelevant noise is crucial for achieving accurate indoor RGB-D semantic segmentation. In this paper, we propose a novel deep model that leverages dual-modal non-local context to guide the aggregation of complementary features and the suppression of noise at multiple stages. Specifically, we introduce a dual-modal non-local context encoding (DNCE) module to learn global representations for each modality at each stage, which are then utilized to facilitate cross-modal complementary clue aggregation (CCA). Subsequently, the enhanced features from both modalities are merged together. Additionally, we propose a semantic guided feature rectification (SGFR) module to exploit rich semantic clues in the top-level merged features for suppressing noise in the lower-stage merged features. Both the DNCE-CCA and the SGFR modules provide dual-modal global views that are essential for effective RGB-D fusion. Experimental results on two public indoor datasets, NYU Depth V2 and SUN-RGBD, demonstrate that our proposed method outperforms state-of-the-art models of similar complexity.
科研通智能强力驱动
Strongly Powered by AbleSci AI