超图
对偶(语法数字)
计算机科学
RGB颜色模型
模式识别(心理学)
人工智能
卷积(计算机科学)
动作(物理)
帧(网络)
时态数据库
空间分析
数学
人工神经网络
数据挖掘
统计
电信
量子力学
离散数学
文学类
物理
艺术
作者
Zhixuan Wu,Nan Ma,Cheng Wang,Cheng Xu,Genbao Xu,Mingxing Li
标识
DOI:10.1016/j.patcog.2024.110427
摘要
For the problems of irrelevant frames and high model complexity in action recognition, we propose a Spatial-Temporal Hypergraph based on Dual-Stage Attention Network (STHG-DAN) for multi-view data lightweight action recognition. It includes two stages: Temporal Attention Mechanism based on Trainable Threshold (TAM-TT) and Hypergraph Convolution based on Dynamic Spatial-Temporal Attention Mechanism (HG-DSTAM). In the first stage, TAM-TT uses a learning threshold to extract keyframes from multi-view videos, with the multi-view data serving as a guarantee for providing more comprehensive information subsequently; In the second stage, HG-DSTAM divides the human joints into three parts: trunk, hand and leg to build spatial–temporal hypergraphs, extracts high-order features from spatial–temporal hypergraphs constructed of multi-view human body joints, inputs them into the dynamic spatial–temporal attention mechanism, and learns the intra frame correlation of multi-view data between the joint features of body parts, which can obtain the significant areas of action; We use multi-scale convolution operation and depth separable network, which can realize efficient action recognition with a few trainable parameters. We experiment on the NTU-RGB+D, NTU-RGB+D 120 and the imitating traffic police gesture dataset. The performance and accuracy of the model are better than the existing algorithms, effectively improving the machine and human body language interaction cognitive ability.
科研通智能强力驱动
Strongly Powered by AbleSci AI