姿势
编码器
计算机科学
人工智能
变压器
图形
利用
神经编码
一般化
模式识别(心理学)
机器学习
理论计算机科学
数学
电压
操作系统
物理
数学分析
量子力学
计算机安全
作者
Zhangmeng Chen,Ju Dai,Junxuan Bai,Junjun Pan
标识
DOI:10.1016/j.patcog.2024.110446
摘要
Despite the significant progress for monocular 3D human pose estimation, it still faces challenges due to self-occlusions and depth ambiguities. To tackle those issues, we propose a novel Dynamic Graph Transformer (DGFormer) to exploit local and global relationships between skeleton joints for pose estimation. Specifically, the proposed DGFormer mainly consists of three core modules: Transformer Encoder (TE), immobile Graph Convolutional Network (GCN), and dynamic GCN. TE module leverages the self-attention mechanism to learn the complex global relationships among skeleton joints. The immobile GCN is responsible for capturing the local physical connections between human joints, while the dynamic GCN concentrates on learning the sparse dynamic K-nearest neighbor interactions according to different action poses. By building the adequately global long-range, local physical, and sparse dynamic dependencies of human joints, experiments on Human3.6M and MPI-INF-3DHP datasets demonstrate that our method can predict 3D pose with lower errors outperforming the recent state-of-the-art image-based performance. Furthermore, experiments on in-the-wild videos demonstrate the impressive generalization abilities of our method. Code will be available at: https://github.com/czmmmm/DGFormer.
科研通智能强力驱动
Strongly Powered by AbleSci AI