Bi-Pose: Bidirectional 2D-3D Transformation for Human Pose Estimation From a Monocular Camera

姿势计算机科学计算机视觉人工智能自动化偏移量（计算机科学）三维姿态估计转化（遗传学）光学（聚焦）关节式人体姿态估计天花板（云）工程类物理光学基因化学机械工程程序设计语言结构工程生物化学

作者

Сонглин Ду,Hao Wang,Zhiwei Yuan,Takeshi Ikenaga

出处

期刊：IEEE Transactions on Automation Science and Engineering [Institute of Electrical and Electronics Engineers]
日期：2024-01-01 卷期号：: 1-14 被引量：2

标识

DOI：10.1109/tase.2023.3279928

摘要

Automatically estimating 3D human poses in video and inferring their meanings play an essential role in many human-centered automation systems. Existing researches made remarkable progresses by first estimating 2D human joints in video and then reconstructing 3D human pose from the 2D joints. However, mono-directionally reconstructing 3D pose from 2D joints ignores the interaction between information in 3D space and 2D space, losses rich information of original video, therefore limits the ceiling of estimation accuracy. To this end, this paper proposes a bidirectional 2D-3D transformation framework that bidirectionally exchanges 2D and 3D information and utilizes video information to estimate an offset for refining 3D human pose. In addition, a bone-length stability loss is utilized for the purpose of exploring human body structure to make the estimated 3D pose more natural and to further increase the overall accuracy. By evaluation, estimation error of the proposed method, measured by the mean per joint position error (MPJPE), is only 46.5 mm, which is much lower than state-of-the-art methods under the same experimental condition. The improvement on accuracy will make machines to better understand human poses for building superior human-centered automation systems. Note to Practitioners —This paper was motivated by the demand of human-centered automation systems needing to accurately understand human poses. Existing approaches mainly focus on inferring 3D human pose from 2D joints mono-directionally. Although they made remarkable contributions to estimating 3D human pose in such a mono-directional way, we found that they ignore the 2D-3D interaction and do not use original video when inferring 3D pose from 2D joints. This paper therefore suggests a bidirectional 2D-3D transformation that exchanges 2D and 3D information and utilizes video information to estimate more accurate 3D human pose for human-centered automation systems. This work is a pioneering attempt of interactively using 2D and 3D information for more accurate estimation of human pose. Benefited from the state-of-the-art accuracy, the proposed approach is expected to make significant contributions to many human-centered automation systems, such as human-machine interaction, biomimetic manipulation, and automatic surveillance systems.

求助该文献

最长约 10秒，即可获得该文献文件

Bi-Pose: Bidirectional 2D-3D Transformation for Human Pose Estimation From a Monocular Camera

今日热心研友