人工智能
计算机科学
姿势
变压器
计算机视觉
传感器融合
模式识别(心理学)
工程类
电压
电气工程
作者
Hailun Xia,Qiang Zhang
标识
DOI:10.1109/iccc56324.2022.10065997
摘要
Fusion of features from multi-view is one of the effective means to improve multi-view 3D pose estimation. It relies on robust and accurate 2D pose estimation, and effective fusion methods. If we treat the same fusion weights for predictions with different accuracy. In this case, “Bad-prediction” can have a bad effect on “Well-prediction” during the fusion process. To address these issues, inspired by previous vision transformer work, we propose a transformer framework for multi-view 3D pose estimation named VitPose, aiming to strengthen the mutual constraint of human articulation points by enhancing the large-scale information of image prediction, rather than calibrating after prediction. In addition, we design simple feedback for training fusion weights, which is used to avoid the interference of “Bad-prediction” to “Well-prediction”. We add multi-view geometric calibration that introduces the spatial information of the view into the transform structure, which is used to strengthen the connection between two views. We conducted extensive experiments on Human3.6M, which showed that our approach achieved competitive results. Specifically, We achieve 17.0 mm Mean Per Joint Position Error(MPJPE) on Human3.6M on 384×384 resolution, which is the State-of-The-Art(SOTA) method with vanilla triangulation.
科研通智能强力驱动
Strongly Powered by AbleSci AI