计算机科学
对偶(语法数字)
比例(比率)
动作(物理)
人工智能
动作识别
融合
计算机视觉
模式识别(心理学)
地理
艺术
语言学
哲学
物理
文学类
地图学
量子力学
班级(哲学)
作者
Yingying Chen,Yanfang Wang,Chang Li,Q. Li,Qian Huang
标识
DOI:10.1109/icn60549.2023.10426122
摘要
RGB video-based action recognition has many application scenarios due to its rich and abundant appearance information for accurate and robust performance. In recent years, convolutional neural networks have been rapidly developed and have made effective achievements in the field of action recognition. However, they cannot adequately extract fine-grained information. It is difficult to effectively complement learning spatio-temporal information even when utilizing two modalities. In this paper, we propose a dual-stream multi-scale fusion method. The method constructs different fine-grained representations of key features through key feature extraction module and near-by fusion to further extract and enhance the multi-scale information. In the multi-scale cross fusion, we utilize temporal gradients of motion information to interact with RGB videos to enhance modal complementarity. The final result fuses multi-scale representations within modalities and higher-order similarities between modalities, showing fine-grained learning of appearance and motion. Compared to other commonly used methods, the algorithm proposed in this paper shows significant improvement on the UCF101 and HMDB51 dataset, achieving 94.12% and 72.55% accuracy, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI