人工智能
RGB颜色模型
计算机科学
保险丝(电气)
模式识别(心理学)
特征提取
子空间拓扑
预处理器
感知器
计算机视觉
人工神经网络
机器学习
工程类
电气工程
作者
Weiyao Xu,Wu Muqing,Min Zhao,Ting Xia
出处
期刊:IEEE Sensors Journal
[Institute of Electrical and Electronics Engineers]
日期:2021-09-01
卷期号:21 (17): 19157-19164
被引量:32
标识
DOI:10.1109/jsen.2021.3089705
摘要
The output of Microsoft Kinect is a multimodal signal, which provides RGB videos, depth sequences and skeleton information at the same time, opening up a new opportunity for the research of human action recognition. However, for different single modalities of the signals, how to exploit and fuse useful features of these various sources remains a very challenging problem. Most of the methods based on RGB-D action recognition simply fuse the multimodal features, ignoring the potential semantic relationship between different models. In this paper, we propose a multi-modal action recognition model based on Bilinear Pooling and Attention Network (BPAN), which could effectively fuse multi-modal for RGB-D action recognition. Firstly, we adopt the efficient data preprocessing methods for RGB and skeleton data. Then, we propose a multimodal fusion network combining RGB video and skeleton sequences. The proposed BPAN module could effectively compress the features of RGB and skeleton, and project them into latent subspace to get the fusion features. In the end, a fully connected three-layer perceptron is adopted to obtain the final classification decision. Experimental results on three public datasets demonstrate that our proposed method leads to a more favorable performance compared with the state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI