计算机科学
人工智能
计算机视觉
束流调整
由运动产生的结构
分割
重射误差
单眼
视差
运动估计
欠采样
可用的
运动(物理)
计算机图形学(图像)
图像(数学)
万维网
作者
Zhoutong Zhang,Forrester Cole,Zhengqi Li,Michael Rubinstein,Noah Snavely,William T. Freeman
标识
DOI:10.1007/978-3-031-19827-4_2
摘要
AbstractCasual videos, such as those captured in daily life using a hand-held camera, pose problems for conventional structure-from-motion (SfM) techniques: the camera is often roughly stationary (not much parallax), and a large portion of the video may contain moving objects. Under such conditions, state-of-the-art SfM methods tend to produce erroneous results, often failing entirely. To address these issues, we propose CasualSAM, a method to estimate camera poses and dense depth maps from a monocular, casually-captured video. Like conventional SfM, our method performs a joint optimization over 3D structure and camera poses, but uses a pretrained depth prediction network to represent 3D structure rather than sparse keypoints. In contrast to previous approaches, our method does not assume motion is rigid or determined by semantic segmentation, instead optimizing for a per-pixel motion map based on reprojection error. Our method sets a new state-of-the-art for pose and depth estimation on the Sintel dataset, and produces high-quality results for the DAVIS dataset where most prior methods fail to produce usable camera poses. KeywordsStructure from motionDepth estimationCasual video
科研通智能强力驱动
Strongly Powered by AbleSci AI