计算机科学
动作识别
动作(物理)
人工智能
物理
量子力学
班级(哲学)
作者
Xiaoqi Gao,Zhaobin Chang,Xingcheng Ran,Yonggang Lu
标识
DOI:10.1016/j.knosys.2024.111852
摘要
Attention mechanisms play a crucial role in improving action recognition performance. A video, a type of 3D data, can be effectively explored using attention mechanisms from temporal, spatial, and channel dimensions. However, existing methods based on 2D CNN tend to deal with complex spatiotemporal information from one or two of the dimensions, which eventually hampers their overall performance. In this paper, we propose a novel Comprehensive Attention Network (CANet) to model spatiotemporal information in all three dimensions adaptively. CANet is composed of three core plug-and-play components, namely the Global Guided Short-term Motion Module (GG-SMM), the Second-order Guided Long-term Motion Module (SG-LMM), and the Spatial Motion Adaptive Module (SMAM). Specifically, (1) the GG-SMM module is designed to represent local motion clues in the short-term temporal dimension to improve the classification accuracy of fast-tempo actions. (2) The SG-LMM module is designed to jointly motivate fine-grained motion information in the long-term temporal and channel dimensions, thereby facilitating the discrimination of long-term motions. (3) The SMAM module is used to represent motion-sensitive regions in the spatial dimension by learning the spatial object offsets. Extensive experiments have been conducted on four widely used action recognition benchmarks, namely, Something-Something V1, Kinetics-400, UCF-101, and HMDB-51. Experimental results demonstrate that the proposed CANet achieves excellent performance compared with other state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI