计算机科学
人工智能
姿势
背景(考古学)
深度学习
噪音(视频)
模式识别(心理学)
领域(数学)
光学(聚焦)
机器学习
数学
图像(数学)
生物
古生物学
物理
纯数学
光学
作者
Chengang Dong,Guodong Du
标识
DOI:10.1038/s41598-024-58146-z
摘要
Abstract The objective of human pose estimation (HPE) derived from deep learning aims to accurately estimate and predict the human body posture in images or videos via the utilization of deep neural networks. However, the accuracy of real-time HPE tasks is still to be improved due to factors such as partial occlusion of body parts and limited receptive field of the model. To alleviate the accuracy loss caused by these issues, this paper proposes a real-time HPE model called $${\textbf {CCAM-Person}}$$ CCAM - Person based on the YOLOv8 framework. Specifically, we have improved the backbone and neck of the YOLOv8x-pose real-time HPE model to alleviate the feature loss and receptive field constraints. Secondly, we introduce the context coordinate attention module (CCAM) to augment the model’s focus on salient features, reduce background noise interference, alleviate key point regression failure caused by limb occlusion, and improve the accuracy of pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, CCAM-Person improves the average precision by 2.8% and 3.5% on the two datasets, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI