计算机科学
人工智能
特征提取
动作识别
卷积神经网络
双线性插值
模式识别(心理学)
计算机视觉
光流
特征(语言学)
图像(数学)
班级(哲学)
语言学
哲学
作者
Yang Wang,X. Y. Shen,H. S. Chen,Jian Sun
标识
DOI:10.1134/s105466182103024x
摘要
Feature extraction based traditional human action recognition algorithms are complicated, leading to low recognition accuracy. We present an algorithm for the recognition of human actions in videos based on spatio-temporal fusion using 3D convolutional neural networks (3D CNNs). The algorithm contains two subnetworks, which extract deep spatial information and temporal information, respectively, and bilinear fusion policy is applied to obtain the final fused spatio-temporal information. Spatial information is represented by a gradient feature, and the temporal information is represented by optical flow. The fused spatio-temporal information can retrieve deep features from multiple angles by constructing a new 3D CNNs. The proposed algorithm is compared with the current mainstream algorithms in the KTH and UCF101 datasets, showing effectiveness and high recognition accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI