计算机科学
Softmax函数
人工智能
卷积神经网络
模式识别(心理学)
循环神经网络
特征(语言学)
光流
特征提取
深度学习
核(代数)
人工神经网络
图像(数学)
组合数学
哲学
语言学
数学
作者
Jiecheng Zhai,Xunxiang Yao,Guangyuan Dong,Qun Jiang,Yunfeng Zhang
标识
DOI:10.1109/cisce55963.2022.9851166
摘要
Human action recognition is a supervised process of labeling an entire video image sequence with action labels in computer vision fields. Different from the recognition of static images, this process also needs to learn the contact information between video frames, such as the timing characteristics that reflect the changes of actions in the video. In existing deep learning methods, due to the size of the convolution kernel, the models use a small number of consecutive frames as input and trains to assign feature vectors to short sequences instead of the entire sequence. Therefore, even if the learned features contain time information, their evolution over time will be completely ignored. In this work, we propose a dual-stream deep fusion framework that can fully utilize the long-term information of a video. We preprocess the video into static frames and optical flow graphs and input them into a three-dimensional convolutional neural network to obtain the spatiotemporal feature stream with time series. Then, the spatiotemporal feature stream is input into a simple recurrent unit network to learn the long-term sequence features of the time dimension. Finally, SoftMax classifier is used for feature classification. We tested our model on the classic action recognition UCF -101 and HMDB-S1 data sets, and our model achieved better performance than existing methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI