Towards Hierarchical Temporal Excitation for Video Violence Recognition
计算机科学
作者
Aihua Mao,Wanqing Wu,Wenwei Yan,Yuxiang Li,Haoxiang Wang
出处
期刊:IEEE transactions on emerging topics in computational intelligence [Institute of Electrical and Electronics Engineers] 日期:2025-01-01卷期号:: 1-14
标识
DOI:10.1109/tetci.2024.3522201
摘要
Video-based violence recognition has become a crucial research topic with the wide usage of surveillance cameras. However, recognizing violent behavior from video data is challenging because of the additional temporal dimension, the lack of a precise range of violent behavior, and the complex backgrounds that make recognizing the interaction between objects difficult. Previous works have ambiguous reasoning of temporal features and insufficient understanding of action relationships. To address these issues, we propose a hierarchical temporal excitation network, which is effective for learning deep object interactions in spatio-temporal information and utilizing the interaction to robustly identify violent behaviors even in complex scenarios. The model we proposed comprises of two modules for temporal excitation, namely the shift temporal adaptive module (STAM) and the sparse object interaction transformer module (SOI-Tr). STAM extracts coarse-grained temporal information by fusing the shift component with the temporal adaptive modeling component. Furthermore, considering that deep-layer temporal features are more conducive to network understanding, SOI-Tr is introduced to excite fine-grained temporal representation reasoning by critical object attention. We conduct extensive experiments on mainstream violence datasets and a new constructed multi-class violence (MCV) dataset. The results show that our method outperforms the state-of-the-art works and is superior in understanding the object interaction in violent behavior recognition.