计算机科学
动作识别
动作(物理)
运动(物理)
人工智能
模式识别(心理学)
计算机视觉
物理
量子力学
班级(哲学)
作者
Xiaotian Lu,Sicheng Zhao,Lechao Cheng,Yitao Zheng,Xueqiao Fan,Mingli Song
标识
DOI:10.1016/j.knosys.2024.111686
摘要
The dual-stream architecture is frequently employed for learning diverse features from videos. This paper introduces a novel Mixed Resolution Network (MixRes) for processing inputs with hybrid spatiotemporal resolutions, namely high-spatial and low-temporal resolution input, as well as low-spatial and high-temporal resolution input. The utilization of mixed spatiotemporal resolutions not only facilitates the independent emphasis of the two streams on appearance and motion encoding but also diminishes the computational burden. Furthermore, by leveraging the characteristics of neural networks with multiple layers, the temporal stream in the proposed network is divided into different steps to capture short-term and long-term motion information. Finally, we design a Temporal Multiscale Motion Excitation (TMME) module, which enhances the motion-related channels of the video representation by employing multiscale temporal differences. We conduct extensive experiments on multiple action recognition benchmarks, including Something-Something V1 & V2 and Kinetics-400. The outcomes validate that the proposed method achieves superior action recognition performance with low computational cost as compared to the state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI