人工智能
计算机视觉
计算机科学
融合
单眼
变压器
传感器融合
工程类
哲学
语言学
电压
电气工程
出处
期刊:IEEE Sensors Journal
[Institute of Electrical and Electronics Engineers]
日期:2024-04-15
卷期号:24 (8): 13620-13628
标识
DOI:10.1109/jsen.2024.3370821
摘要
Depth estimation from monocular vision sensor is a fundamental problem in scene perception with wide industrial applications. Previous works tend to predict the scene depth based on high-level features obtained by convolutional neural networks (CNNs) or rely on encoder–decoder frameworks of Transformers. However, they achieved less satisfactory results, especially around object contours. In this article, we propose a Transformer-based contour-aware depth estimation module to recover the scene depth with the aid of the enhanced perception of object contours. Besides, we develop a cascaded multiscale fusion module to aggregate multilevel features, where we combine the global context with local information and refine the depth map to a higher resolution from coarse to fine. Finally, we model depth estimation as a classification problem and discretize the depth value in an adaptive way to further improve the performance of our network. Extensive experiments have been conducted on mainstream public datasets (KITTI and NYUv2) to demonstrate the effectiveness of our network, where our network exhibits superior performance against other state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI