计算机科学
卷积神经网络
变压器
瓶颈
循环神经网络
人工智能
深度学习
嵌入
机器学习
人工神经网络
数据挖掘
实时计算
模式识别(心理学)
工程类
嵌入式系统
电气工程
电压
标识
DOI:10.1109/comnetsat53002.2021.9530829
摘要
In smart cities, violence event detection is critical to ensure city safety. Several studies have been done on this topic with a focus on 2d-Convolutional Neural Network (2d-CNN) to detect spatial features from each frame, followed by one of the Recurrent Neural Networks (RNN) variants as a temporal features learning method. On the other hand, the transformer network has achieved a great result in many areas. The bottleneck for transformers is the need for large data set to achieve good results. In this work, we propose a data-efficient video transformer (DeVTr) based on the transformer network as a Spatio-temporal learning method with a pre-trained 2d-Convolutional neural network (2d-CNN) as an embedding layer for the input data. The model has been trained and tested on the Real-life violence dataset (RLVS) and achieved an accuracy of 96.25%. A comparison of the result for the suggested method with previous techniques illustrated that the suggested method provides the best result among all the other studies for violence event detection.
科研通智能强力驱动
Strongly Powered by AbleSci AI