计算机科学
人工智能
残余物
钥匙(锁)
动作(物理)
隐马尔可夫模型
卷积神经网络
马尔可夫链
机器学习
任务(项目管理)
概率逻辑
模式识别(心理学)
马尔可夫模型
特征(语言学)
代表(政治)
算法
语言学
哲学
物理
计算机安全
管理
量子力学
政治
政治学
法学
经济
作者
Yunfang Xu,Zengmao Wang,Xiaoping Zhang
标识
DOI:10.1016/j.neunet.2023.10.047
摘要
The effective use of temporal relationships while extracting fertile spatial features is the key to video action understanding. Video action understanding is a challenging visual task because it generally necessitates not only the features of individual key frames but also the contextual understanding of the entire video and the relationships among key frames. Temporal relationships pose a challenge to video action understanding. However, existing 3D convolutional neural network approaches are limited, with a great deal of redundant spatial and temporal information. In this paper, we present a novel two-stream approach that incorporates Spatial Residual Attention and Temporal Markov (SRATM) to learn complementary features to achieve stronger video action understanding performance. Specifically, the proposed SRATM consists of spatial residual attention and temporal Markov. Firstly, the spatial residual attention network captures effective spatial feature representation. Further, the temporal Markov network enhances the model by learning the temporal relationships via conducting probabilistic logic calculation among frames in a video. Finally, we conduct extensive experiments on four video action datasets, namely, Something-Something-V1, Something-Something-V2, Diving48, and Mini-Kinetics, show that the proposed SRATM method achieves competitive results.
科研通智能强力驱动
Strongly Powered by AbleSci AI