期刊:ACM Transactions on Intelligent Systems and Technology [Association for Computing Machinery] 日期:2021-06-29卷期号:12 (3): 1-24被引量:6
标识
DOI:10.1145/3447686
摘要
In this article, we study the problem of video-based action recognition. We improve the action recognition performance by finding an effective temporal and appearance representation. For capturing the temporal representation, we introduce two temporal learning techniques for improving long-term temporal information modeling, specifically Temporal Relational Network and Temporal Second-Order Pooling-based Network. Moreover, we harness the representation using complementary learning techniques, specifically Global-Local Network and Fuse-Inception Network. Performance evaluation on three datasets (UCF101, HMDB-51, and Mini-Kinetics-200) demonstrated the superiority of the proposed framework compared to the 2D Deep ConvNets-based state-of-the-art techniques.