计算机科学
核(代数)
卷积(计算机科学)
计算
模式识别(心理学)
卷积神经网络
人工智能
动作识别
特征(语言学)
计算复杂性理论
比例(比率)
特征提取
棱锥(几何)
算法
人工神经网络
数学
班级(哲学)
量子力学
语言学
组合数学
物理
哲学
几何学
作者
Dayin Yang,Hongyun Xiong,Xiaohong Nian,Zhao Li
标识
DOI:10.1109/ctisc54888.2022.9849822
摘要
Convolutional networks have been widely used in action recognition, however, due to the fact that action recognition uses videos containing a large number of images as input data, the networks often have high complexity and produce a lot of computation which put forward high requirements for equipments. And because of the diversity of action classes and the multi-scale of action in temporal and spatial domain in the videos, there are still challenges to accurately recognize actions. In this paper, we use depthwise convolution and kernel factorizations to design a lightweight spatiotemporal feature extraction structure to reduce network computational complexity, and considering the diversity of human actions in temporal and spatial scales, we use a convolutional pyramid structure with multiple convolution kernels to extract multi-scale features. We name the proposed structure multi-scale depthwise module (MSD). We embed the MSD module in the two-stream convolutional network, and called multi-scale depthwise convolutional network (MSDCN). Experiments are carried out on human action datasets UCF101, HMDB51 and Kinetics, and the accuracy is 92.13%, 65.90% and 72.73%. The results show that the proposed MSD module is effective and MSDCN gets comparable results. In addition, in terms of network parameters and computational complexity, the proposed MSDCN network has a very low amount of parameters and computation, which is more than 60% lower than the baseline network.
科研通智能强力驱动
Strongly Powered by AbleSci AI