计算机科学
Softmax函数
卷积神经网络
人工智能
模式识别(心理学)
卷积(计算机科学)
光流
RGB颜色模型
人工神经网络
图像(数学)
作者
Wei Dai,Yimin Chen,Chen Huang,Mingke Gao,Xinyu Zhang
标识
DOI:10.1109/ijcnn.2019.8851702
摘要
Recently, as the application of the convolutional neural network in artificial intelligence is becoming increasingly diversified, a growing number of neural network methods are put forward. For example, 3D convolution and two-stream convolution method based on RGB and optical stream are applied to the neural network. Convolutional neural network with 3D convolutional core is able to extract spatio-temporal features directly from a set of video sequences, used for action recognition. Although the 3D convolutional neural network can obtain partial spatio-temporal information, a new ConvNet architecture called CVDN(Combined Video-stream Deep Network) is proposed to extract more spatio-temporal features from video fragments so as to effectively utilize the temporal information in the dataset. We evaluate our method on the UCF-101 dataset and obtain a good result. The following is some details about our method:First, we use pre-trained ResNets models on Kinetics dataset to initialize our training models, training and extracting the video stream features from UCF-101 dataset. Then, optical flow graphs obtained from the UCF-101 dataset, which are the input of the optical stream, are used to extract the optical features. At length, two-stream features are combined and the results are obtained after Softmax layer. When the linear fusion ratio of video stream features and optical stream features is 5:4, CVDN obtains good results. And the accuracy of our method with Resnet-101 achieves 92.2%.
科研通智能强力驱动
Strongly Powered by AbleSci AI