卷积神经网络
计算机科学
人工智能
动作识别
动作(物理)
深度学习
搜索引擎索引
模式识别(心理学)
机器学习
班级(哲学)
量子力学
物理
作者
Guangle Yao,Tao Leí,Jiandan Zhong
标识
DOI:10.1016/j.patrec.2018.05.018
摘要
Video action recognition is widely applied in video indexing, intelligent surveillance, multimedia understanding, and other fields. Recently, it was greatly improved by incorporating the learning of deep information using Convolutional Neural Network (CNN). This motivated us to review the notable CNN-based action recognition works. Because CNN is primarily designed to extract 2D spatial features from still image and videos are naturally viewed as 3D spatiotemporal signals, the core issue of extending the CNN from image to video is temporal information exploitation. We divide the solutions for exploiting temporal information exploration into three strategies: 1) 3D CNN; 2) taking the motion-related information as the CNN input; and 3) fusion. In this paper, we present a comprehensive review of the CNN-based action recognition methods according to these strategies. We also discuss the action recognition performance on recent large-scale benchmarks and the limitations and future research directions of CNN-based action recognition. This paper offers an objective and clear review of CNN-based action recognition and provides a guide for future research.
科研通智能强力驱动
Strongly Powered by AbleSci AI