Salient Object Detection in RGB-D Videos

人工智能计算机视觉计算机科学目标检测 RGB颜色模型对象类检测突出对象（语法）模式识别（心理学）人脸检测面部识别系统

作者

Ao Mou,Yukang Lu,Jiahao He,Dingyao Min,Keren Fu,Qijun Zhao

出处

期刊：IEEE transactions on image processing [Institute of Electrical and Electronics Engineers]
日期：2024-01-01 卷期号：33: 6660-6675 被引量：9

链接

nih.govdoi.org

标识

DOI：10.1109/tip.2024.3498326

摘要

Given the widespread adoption of depth-sensing acquisition devices, RGB-D videos and related data/media have gained considerable traction in various aspects of daily life. Consequently, conducting salient object detection (SOD) in RGB-D videos presents a highly promising and evolving avenue. Despite the potential of this area, SOD in RGB-D videos remains somewhat under-explored, with RGB-D SOD and video SOD (VSOD) traditionally studied in isolation. To explore this emerging field, this paper makes two primary contributions: the dataset and the model. On one front, we construct the RDVS dataset, a new RGB-D VSOD dataset with realistic depth and characterized by its diversity of scenes and rigorous frame-by-frame annotations. We validate the dataset through comprehensive attribute and object-oriented analyses, and provide training and testing splits. Moreover, we introduce DCTNet+, a three-stream network tailored for RGB-D VSOD, with an emphasis on RGB modality and treats depth and optical flow as auxiliary modalities. In pursuit of effective feature enhancement, refinement, and fusion for precise final prediction, we propose two modules: the multi-modal attention module (MAM) and the refinement fusion module (RFM). To enhance interaction and fusion within RFM, we design a universal interaction module (UIM) and then integrate holistic multi-modal attentive paths (HMAPs) for refining multi-modal low-level features before reaching RFMs. Comprehensive experiments, conducted on pseudo RGB-D video datasets alongside our proposed RDVS, highlight the superiority of DCTNet+ over 19 VSOD models and 14 RGB-D SOD models. Additionally, insightful ablation experiments were performed on both pseudo and realistic RGB-D video datasets to demonstrate the advantages of individual modules as well as the necessity of introducing realistic depth into VSOD. Our code together with RDVS dataset will be available at https://github.com/kerenfu/RDVS/.

求助该文献

最长约 10秒，即可获得该文献文件

Salient Object Detection in RGB-D Videos

今日热心研友