A Local–Global Estimator Based on Large Kernel CNN and Transformer for Human Pose Estimation and Running Pose Measurement

姿势人工智能计算机科学卷积神经网络地点估计员编码器变压器模式识别（心理学）计算机视觉机器学习工程类数学统计操作系统电气工程哲学电压语言学

作者

Qingtian Wu,Yongfei Wu,Yu Zhang,Liming Zhang

出处

期刊：IEEE Transactions on Instrumentation and Measurement [Institute of Electrical and Electronics Engineers]
日期：2022-01-01 卷期号：71: 1-12 被引量：18

标识

DOI：10.1109/tim.2022.3200438

摘要

Running pose in the crowd can serve as an early warning of most abnormal events (e.g., chasing, fleeing and robbing), which can be achieved by human behavior analysis based on human pose measurement. Although deep convolutional neural networks (CNNs) have achieved impressive progress on human pose estimation, how to further improve the trade-off between estimation accuracy and speed remains an open issue. In this work, we first propose an efficient local-global estimator for human pose estimation (called LGPose). Then based on the keypoints estimated by our LGPose, a simple regression model is defined by using the geometry of the joints to achieve fast and accurate running pose measurement. To model the relationships between the human keypoints, visual transformer (ViT) encoder is adopted to learn the long-range interdependencies between them at the pixel level. However, the operation of transformer encoder is based on sequence processing that linearly projects 2D image patches to 1D tokens. It loses the important local information. Yet, locality is crucial since it has relevance to lines, edges and shapes. To learn the locality, we design effective CNN modules, rather than the original fully-connected network, into the feedforward module of ViT. Experiments on MPII and COCO Keypoint val2017 dataset show that the proposed LGPose achieves the best trade-off among the compared state-of-the-art methods. Moreover, we build a lightweight running movement dataset to verify the effectiveness of our LGPose. Based on the human pose estimated by our LGPose, we propose a regression model to measure running pose with an accuracy of 86.4% without training any other classifier. Our source codes and running dataset will be made publicly available.

求助该文献

最长约 10秒，即可获得该文献文件

A Local–Global Estimator Based on Large Kernel CNN and Transformer for Human Pose Estimation and Running Pose Measurement

今日热心研友