计算机视觉
人工智能
计算机科学
单眼
迭代重建
计算机图形学(图像)
作者
Xi Chen,Jiaming Sun,Yiming Xie,Hujun Bao,Xiaowei Zhou
标识
DOI:10.1109/tpami.2024.3393141
摘要
We present a novel framework named NeuralRecon for real-time 3D scene reconstruction from a monocular video. Unlike previous methods that estimate single-view depth maps separately on each key-frame and fuse them later, we propose to directly reconstruct local surfaces represented as sparse TSDF volumes for each video fragment sequentially by a neural network. A learning-based TSDF fusion module based on gated recurrent units is used to guide the network to fuse features from previous fragments. This design allows the network to capture local smoothness prior and global shape prior of 3D surfaces when sequentially reconstructing the surfaces, resulting in accurate, coherent, and real-time surface reconstruction. The fused features can also be used to predict semantic labels, allowing our method to reconstruct and segment the 3D scene simultaneously. Furthermore, we purpose an efficient self-supervised fine-tuning scheme that refines scene geometry based on input images through differentiable volume rendering. This fine-tuning scheme improves reconstruction quality on the fine-tuned scenes as well as the generalization to similar test scenes. The experiments on ScanNet, 7-Scenes and Replica datasets show that our system outperforms state-of-the-art methods in terms of both accuracy and speed.
科研通智能强力驱动
Strongly Powered by AbleSci AI