Representation modeling learning with multi-domain decoupling for unsupervised skeleton-based action recognition

解耦（概率）计算机科学人工智能代表（政治）特征学习骨架（计算机编程）模式识别（心理学）动作识别领域（数学分析）机器学习数学班级（哲学）数学分析控制工程政治法学政治学工程类程序设计语言

作者

Zhihai He,Jinglei Lv,Shixiong Fang

出处

期刊：Neurocomputing [Elsevier BV]
日期：2024-05-01 卷期号：582: 127495-127495

标识

DOI：10.1016/j.neucom.2024.127495

摘要

Skeleton-based action recognition is one of the basic researches in computer vision. In recent years, the unsupervised contrastive learning paradigm has achieved great success in skeleton-based action recognition. However, previous work often treated input skeleton sequences as a whole when performing comparisons, lacking fine-grained representation contrast learning. Therefore, we propose a contrastive learning method for Representation Modeling with Multi-domain Decoupling (RMMD), which extracts the most significant representations from input skeleton sequences in the temporal domain, spatial domain and frequency domain, respectively. Specifically, in the temporal and spatial domains, we propose a multi-level spatiotemporal mining reconstruction module (STMR) that iteratively reconstructs the original input skeleton sequences to highlight spatiotemporal representations under different actions. At the same time, we introduce position encoding and a global adaptive attention matrix, balancing both global and local information, and effectively modeling the spatiotemporal dependencies between joints. In the frequency domain, we use the discrete cosine transform (DCT) to achieve temporal-frequency conversion, discard part of the interference information, and use the frequency self-attention (FSA) and multi-level aggregation perceptron (MLAP) to deeply explore the frequency domain representation. The fusion of the temporal domain, spatial domain and frequency domain representations makes our model more discriminative in representing different actions. Besides, we verify the effectiveness of the model on the NTU RGB+D and PKU-MMD datasets. Extensive experiments show that our method outperforms existing unsupervised methods and achieves significant performance improvements in downstream tasks such as action recognition and action retrieval.

求助该文献

最长约 10秒，即可获得该文献文件

Representation modeling learning with multi-domain decoupling for unsupervised skeleton-based action recognition

今日热心研友