计算机科学
光学(聚焦)
人工智能
骨架(计算机编程)
接头(建筑物)
粒度
钥匙(锁)
动作(物理)
动作识别
模式识别(心理学)
元组
计算机视觉
数学
离散数学
光学
物理
工程类
操作系统
建筑工程
量子力学
计算机安全
程序设计语言
班级(哲学)
标识
DOI:10.1016/j.patcog.2023.110188
摘要
Joint-level and part-level information are crucial for modeling actions with different granularity. In addition, the relevant information on different joints between consecutive frames is very useful for skeleton-based action recognition. To effectively capture the action information, a new multi-grained clip focus network (MGCF-Net) is proposed. Firstly, the skeleton sequence is divided into multiple clips, each containing several consecutive frames. According to the structure of the human body, each clip is divided into several tuples. Then an intra-clip attention module is proposed to capture intra-clip action information. Specifically, multi-head self-attention is divided into two parts, obtaining relevant information at the joint and part levels, and integrating the information captured from these two parts to obtain multi-grained contextual features. In addition, an inter-clip focus module is used to capture the key information of several consecutive sub-actions, which will help to distinguish similar actions. On two large-scale benchmarks for skeleton-based action recognition, our method achieves the most advanced performance, and its effectiveness has been verified.
科研通智能强力驱动
Strongly Powered by AbleSci AI