计算机科学
人工智能
卷积神经网络
编码器
深度学习
计算机视觉
边距(机器学习)
动作识别
模式识别(心理学)
任务(项目管理)
图像(数学)
图像传感器
机器学习
管理
经济
班级(哲学)
操作系统
作者
Sudhakar Kumawat,Tadashi Okawara,Michitaka Yoshida,Hajime Nagahara,Yasushi Yagi
标识
DOI:10.1109/tpami.2022.3196350
摘要
The unprecedented success of deep convolutional neural networks (CNN) on the task of video-based human action recognition assumes the availability of good resolution videos and resources to develop and deploy complex models. Unfortunately, certain budgetary and environmental constraints on the camera system and the recognition model may not be able to accommodate these assumptions and require reducing their complexity. To alleviate these issues, we introduce a deep sensing solution to directly recognize human actions from coded exposure images. Our deep sensing solution consists of a binary CNN-based encoder network that emulates the capturing of a coded exposure image of a dynamic scene using a coded exposure camera, followed by a 2D CNN for recognizing human action in the captured coded exposure image. Furthermore, we propose a novel knowledge distillation framework to jointly train the encoder and the action recognition model and show that the proposed training approach improves the action recognition accuracy by an absolute margin of 6.2%, 2.9%, and 7.9% on Something 2-v2, Kinetics-400, and UCF-101 datasets, respectively, in comparison to our previous approach. Finally, we built a prototype coded exposure camera using LCoS to validate the feasibility of our deep sensing solution. Our evaluation of the prototype camera show results that are consistent with the simulation results.
科研通智能强力驱动
Strongly Powered by AbleSci AI