动作识别
卷积(计算机科学)
维数(图论)
帧(网络)
计算机科学
动作(物理)
任务(项目管理)
模式识别(心理学)
人工智能
空间智能
数学
人工神经网络
物理
经济
纯数学
管理
电信
班级(哲学)
量子力学
作者
Yangjun Ou,Zhenzhong Chen
标识
DOI:10.1016/j.jvcir.2023.103804
摘要
Modeling and reasoning of the interactions between multiple entities (actors and objects) are beneficial for the action recognition task. In this paper, we propose a 3D Deformable Convolution Temporal Reasoning (DCTR) network to model and reason about the latent relationship dependencies between different entities in videos. The proposed DCTR network consists of a spatial modeling module and a temporal reasoning module. The spatial modeling module uses 3D deformable convolution to capture relationship dependencies between different entities in the same frame, while the temporal reasoning module uses Conv-LSTM to reason about the changes of multiple entity relationship dependencies in the temporal dimension. Experiments on the Moments-in-Time dataset, UCF101 dataset and HMDB51 dataset demonstrate that the proposed method outperforms several state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI