计算机科学
卷积神经网络
人工智能
RGB颜色模型
光流
模式识别(心理学)
卷积(计算机科学)
特征(语言学)
动作识别
标杆管理
计算机视觉
人工神经网络
图像(数学)
哲学
业务
营销
语言学
班级(哲学)
作者
Jun Chen,Yuanping Xu,Chaolong Zhang,Zhijie Xu,Xiangxiang Meng,Jie Wang
标识
DOI:10.23919/iconac.2019.8894962
摘要
In order to obtain global contextual information precisely from videos with heavy camera motions and scene changes, this study proposes an improved spatiotemporal two-stream neural network architecture with a novel convolutional fusion layer. The three main improvements of this study are: 1) the Resnet-101 network has been integrated into the two streams of the target network independently; 2) two kinds of feature maps (i.e., the optical flow motion and RGB-channel information) obtained by the corresponding convolution layer of two streams respectively are superimposed on each other; 3) the temporal information is combined with the spatial information by the integrated three-dimensional (3D) convolutional neural network (CNN) to extract more latent information from the videos. The proposed approach was tested by using UCF-101 and HMDB51 benchmarking datasets and the experimental results show that the proposed two-stream 3D CNN model can gain substantial improvement on the recognition rate in video-based analysis.
科研通智能强力驱动
Strongly Powered by AbleSci AI