人工智能
计算机科学
单眼
计算机视觉
特征(语言学)
保险丝(电气)
光流
帧(网络)
特征提取
计算
图像(数学)
工程类
哲学
语言学
电信
算法
电气工程
作者
Haoran Cheng,Pei Liang,Yufeng Zheng,Borong Lin,Xiaofei He,Boxi Wu
出处
期刊:IEEE transactions on image processing
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:33: 2665-2675
标识
DOI:10.1109/tip.2024.3378475
摘要
Previous monocular 3D detection works focus on the single frame input in both training and inference. In real-world applications, temporal and motion information naturally exists in monocular video. It is valuable for 3D detection but under-explored in monocular works. In this paper, we propose a straightforward and effective method for temporal feature fusion, which exhibits low computation cost and excellent transferability, making it conveniently applicable to various monocular models. Specifically, with the help of optical flow, we transform the backbone features produced by prior frames and fuse them into the current frame. We introduce the scene feature propagating mechanism, which accumulates history scene features without extra time-consuming. In this process, occluded areas are removed via forward-backward scene consistency. Our method naturally introduces valuable temporal features, facilitating 3D reasoning in monocular 3D detection. Furthermore, accumulated history scene features via scene propagating mitigate heavy computation overheads for video processing. Experiments are conducted on variant baselines, which demonstrate that the proposed method is model-agonistic and can bring significant improvement to multiple types of single-frame methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI