人工智能
特征(语言学)
计算机科学
模式识别(心理学)
计算机视觉
几何学
计算几何
数学
语言学
哲学
作者
Zhongkai Zhou,Xinnan Fan,Pengfei Shi,Yuanxue Xin,Dongliang Duan,Liuqing Yang
标识
DOI:10.1109/tpami.2024.3420165
摘要
The U-Net-like coarse-to-fine network design is currently the dominant choice for dense prediction tasks. Although this design can often achieve competitive performance, it suffers from some inherent limitations, such as training error propagation from low to high resolution and the dependency on the deeper and heavier backbones. To design an effective network that performs better, we instead propose Recurrent Multiscale Feature Modulation (R-MSFM), a new lightweight network design for self-supervised monocular depth estimation. R-MSFM extracts per-pixel features, builds a multiscale feature modulation module, and performs recurrent depth refinement through a parameter-shared decoder at a fixed resolution. This network design enables our R-MSFM to maintain a more lightweight architecture and fundamentally avoid error propagation caused by the coarse-to-fine design. Furthermore, we introduce the mask geometry consistency loss to facilitate our R-MSFM for geometry consistent depth learning. This loss penalizes the inconsistency of the estimated depths between adjacent views within the nonoccluded and nonstationary regions. Experimental results demonstrate the superiority of our proposed R-MSFM both at model size and inference speed, and show state-of-the-art results on two datasets: KITTI and Make3D. The code is available at https://github.com/jsczzzk/R-MSFM.
科研通智能强力驱动
Strongly Powered by AbleSci AI