计算机科学
人工智能
初始化
水准点(测量)
模式识别(心理学)
RGB颜色模型
特征(语言学)
注释
机器学习
特征学习
大地测量学
语言学
哲学
程序设计语言
地理
作者
Xiaoqi Zhao,Youwei Pang,Lihe Zhang,Huchuan Lu,Xiang Ruan
出处
期刊:Proceedings of the ... AAAI Conference on Artificial Intelligence
[Association for the Advancement of Artificial Intelligence (AAAI)]
日期:2022-06-28
卷期号:36 (3): 3463-3471
被引量:29
标识
DOI:10.1609/aaai.v36i3.20257
摘要
Existing CNNs-Based RGB-D salient object detection (SOD) networks are all required to be pretrained on the ImageNet to learn the hierarchy features which helps provide a good initialization. However, the collection and annotation of large-scale datasets are time-consuming and expensive. In this paper, we utilize self-supervised representation learning (SSL) to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation. Our pretext tasks require only a few and unlabeled RGB-D datasets to perform pretraining, which makes the network capture rich semantic contexts and reduce the gap between two modalities, thereby providing an effective initialization for the downstream task. In addition, for the inherent problem of cross-modal fusion in RGB-D SOD, we propose a consistency-difference aggregation (CDA) module that splits a single feature fusion into multi-path fusion to achieve an adequate perception of consistent and differential information. The CDA module is general and suitable for cross-modal and cross-level feature fusion. Extensive experiments on six benchmark datasets show that our self-supervised pretrained model performs favorably against most state-of-the-art methods pretrained on ImageNet. The source code will be publicly available at https://github.com/Xiaoqi-Zhao-DLUT/SSLSOD.
科研通智能强力驱动
Strongly Powered by AbleSci AI