计算机科学
卷积(计算机科学)
水准点(测量)
建筑
图层(电子)
动作识别
帧(网络)
人工智能
模式识别(心理学)
比例(比率)
缩放比例
动作(物理)
数据挖掘
人工神经网络
数学
艺术
电信
化学
几何学
大地测量学
有机化学
视觉艺术
地理
物理
量子力学
班级(哲学)
作者
Tushar Singh,Anunay Anunay,Vayam Jain,Anand Singh,Nilesh Aggarwal,Sarthak Singh
标识
DOI:10.1109/icacr59381.2023.10314578
摘要
The research represents an E3D architecture created by upscaling 2D EfficientNet architecture into a three-dimensional one. E3D is a simple yet productive network that utilises a custom and novel Depthwise 3D convolution layer on which the entire algorithm depends. A depthwise convolution layer is devised for simultaneously understanding the spatial and temporal information for a large-scale processed frame sequence dataset by leveraging and scaling EfficientNetB0. This homogeneous architecture summarises the 101 distinct actions from two publicly available action classification datasets, HMDB-51 and UCF-101. The manuscript further explains the functionality of E3D and how its trained features outshine the other state-of-the-art classification benchmark in terms of performance, efficiency and deployment characteristics by achieving 98.74% and 86.49% accuracy on the UCF-101 dataset and HMDB-51 dataset, respectively; with this, the proposed architecture is computationally cheaper than the other action recognition architectures.
科研通智能强力驱动
Strongly Powered by AbleSci AI