变压器
计算机科学
人工智能
机器学习
单眼
像素
模式识别(心理学)
计算机视觉
工程类
电压
电气工程
作者
Daechan Han,Jeongmin Shin,Namil Kim,Soonmin Hwang,Yukyung Choi
出处
期刊:IEEE robotics and automation letters
日期:2022-08-05
卷期号:7 (4): 10969-10976
被引量:15
标识
DOI:10.1109/lra.2022.3196781
摘要
Recently, transformers have been widely adopted for various computer vision tasks and show promising results due to their ability to encode long-range spatial dependencies in an image effectively. However, very few studies on adopting transformers in self-supervised depth estimation have been conducted. When replacing the CNN architecture with the transformer in self-supervised learning of depth, we encounter several problems such as problematic multi-scale photometric loss function when used with transformers and, insufficient ability to capture local details. In this letter, we propose an attention-based decoder module, Pixel-Wise Skip Attention (PWSA), to enhance fine details in feature maps while keeping global context from transformers. In addition, we propose utilizing self-distillation loss with single-scale photometric loss to alleviate the instability of transformer training by using correct training signals. We demonstrate that the proposed model performs accurate predictions on large objects and thin structures that require global context and local details. Our model achieves state-of-the-art performance among the self-supervised monocular depth estimation methods on KITTI and DDAD benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI