计算机科学
编码器
嵌入
模式识别(心理学)
变压器
人工智能
动作识别
电压
量子力学
操作系统
物理
班级(哲学)
作者
Jun Kong,Yuhang Bian,Min Jiang
出处
期刊:IEEE Signal Processing Letters
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:29: 528-532
被引量:30
标识
DOI:10.1109/lsp.2022.3142675
摘要
In the task of skeleton-based action recognition, long-term temporal dependencies are significant cues for sequential skeleton data. State-of-the-art methods rarely have access to long-term temporal information, due to the limitations of their receptive fields. Meanwhile, most of the recent multiple branches methods only consider different input modalities but ignore the information in various temporal scales. To address the above issues, we propose a multi-scale temporal transformer (MTT) in this letter, for skeleton-based action recognition. Firstly, the raw skeleton data are embedded by graph convolutional network (GCN) blocks and multi-scale temporal embedding modules (MT-EMs), which are designed as multiple branches to extract features in various temporal scales. Secondly, we introduce transformer encoders (TE) to integrate embeddings and model the long-term temporal pattern. Moreover, we propose a task-oriented lateral connection (LaC) aiming to align semantical hierarchies. LaC distributes input embeddings to the downstream transformer encoders (TE), according to semantical levels. The classification headers aggregate results from TE and predict the action categories at last. The proposed method is shown efficiency and universality during experiments and achieves the state-of-the-art on three large datasets, NTU-RGBD 60, NTU-RGBD 120 and Kinetics-Skeleton 400.
科研通智能强力驱动
Strongly Powered by AbleSci AI