计算机科学
人工智能
深度学习
保险丝(电气)
背景(考古学)
循环神经网络
卷积神经网络
特征(语言学)
模式识别(心理学)
人工神经网络
古生物学
语言学
哲学
电气工程
生物
工程类
标识
DOI:10.1016/j.patrec.2021.08.017
摘要
The paper investigates the Long short term memory (LSTM) networks for human action recognition in videos. In spite of significant progress in the field, recognizing actions in real-world videos is a challenging task due to the spatial and temporal variations within and across video clips. We propose a novel two-stream deep network for action recognition by applying the LSTM for learning the fusion of spatial and temporal feature streams. The LSTM type of Recurrent neural network by design possess unique capability to preserve long range context in temporal streams. The proposed method capitalizes on LSTMs memory attribute to fuse the input streams in high-dimensional space exploring the spatial and temporal correlations. The temporal stream input is defined on the LSTM learned deep features summarizing the input frame sequence. Our approach of combining the convolutional features based spatial stream and the deep features based temporal stream in LSTM network efficiently captures the long range temporal dependencies in video streams. We perform primary evaluation of the proposed approach on UCF101, HMBD51 and Kinetics400 datasets achieving competitive recognition accuracy of 93.1%, 71.3% and 74.6% respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI