计算机科学
管道(软件)
姿势
人工智能
基本事实
依赖关系(UML)
三维姿态估计
机器学习
投影(关系代数)
质量(理念)
刮擦
三维模型
模式识别(心理学)
计算机视觉
算法
哲学
操作系统
程序设计语言
认识论
作者
Hanbyul Joo,Natalia Neverova,Andrea Vedaldi
标识
DOI:10.1109/3dv53792.2021.00015
摘要
Differently from 2D image datasets such as COCO, largescale human datasets with 3D ground-truth annotations are very difficult to obtain in the wild. In this paper, we address this problem by augmenting existing 2D datasets with high-quality 3D pose fits. Remarkably, the resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in the-wild benchmarks such as 3DPW. Additionally, training on our augmented data is straightforward as it does not require to mix multiple and incompatible 2D and 3D datasets or to use complicated network architectures and training procedures. This simplified pipeline affords additional improvements, including injecting extreme crop augmentations to better reconstruct highly truncated people, and incorporating auxiliary inputs to improve 3D pose estimation accuracy. It also reduces the dependency on 3D datasets such as H36M that have restrictive licenses. We also use our method to introduce new benchmarks for the study of real-world challenges such as occlusions, truncations, and rare body poses. In order to obtain such high quality 3D pseudo-annotations, inspired by progress in internal learning, we introduce Exemplar Fine-Tuning (EFT). EFT combines the re-projection accuracy of fitting methods like SMPLify with a 3D pose prior implicitly captured by a pre-trained 3D pose regressor network. We show that EFT produces 3D annotations that result in better downstream performance and are qualitatively preferable in an extensive human-based assessment.
科研通智能强力驱动
Strongly Powered by AbleSci AI