计算机科学
人工智能
变压器
自然语言处理
图形
动作识别
模式识别(心理学)
机器学习
理论计算机科学
工程类
电气工程
电压
班级(哲学)
作者
Libo Weng,Weidong Lou,Fei Gao
出处
期刊:Communications in computer and information science
日期:2023-11-26
卷期号:: 283-299
标识
DOI:10.1007/978-981-99-8141-0_22
摘要
The Transformer model is a novel neural network architecture based on a self-attention mechanism, primarily used in the field of natural language processing and is currently being introduced to the computer vision domain. However, the Transformer model has not been widely applied in the task of human action recognition. Action recognition is typically described as a single classification task, and the existing recognition algorithms do not fully leverage the semantic relationships within actions. In this paper, a new method named Language Guided Graph Transformer (LGGT) for Skeleton Action Recognition is proposed. The LGGT method combines textual information and Graph Transformer to incorporate semantic guidance in skeleton-based action recognition. Specifically, it employs Graph Transformer as the encoder for skeleton data to extract feature representations and effectively captures long-distance dependencies between joints. Additionally, LGGT utilizes a large-scale language model as a knowledge engine to generate textual descriptions specific to different actions, capturing the semantic relationships between actions and improving the model’s understanding and accurate recognition and classification of different actions. We extensively evaluate the performance of using the proposed method for action recognition on the Smoking dataset, Kinetics-Skeleton dataset, and NTU RGB $$+$$ D action dataset. The experimental results demonstrate significant performance improvements of our method on these datasets, and the ablation study shows that the introduction of semantic guidance can further enhance the model’s performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI