计算机科学
人工智能
RGB颜色模型
计算机视觉
活动识别
卷积神经网络
模式识别(心理学)
特征(语言学)
支持向量机
图像(数学)
语言学
哲学
作者
Dimitrios Koutrintzes,Evaggelos Spyrou,Eirini Mathe,Phivos Mylonas
标识
DOI:10.1142/s0129065723500028
摘要
The problem of human activity recognition (HAR) has been increasingly attracting the efforts of the research community, having several applications. It consists of recognizing human motion and/or behavior within a given image or a video sequence, using as input raw sensor measurements. In this paper, a multimodal approach addressing the task of video-based HAR is proposed. It is based on 3D visual data that are collected using an RGB + depth camera, resulting to both raw video and 3D skeletal sequences. These data are transformed into six different 2D image representations; four of them are in the spectral domain, another is a pseudo-colored image. The aforementioned representations are based on skeletal data. The last representation is a "dynamic" image which is actually an artificially created image that summarizes RGB data of the whole video sequence, in a visually comprehensible way. In order to classify a given activity video, first, all the aforementioned 2D images are extracted and then six trained convolutional neural networks are used so as to extract visual features. The latter are fused so as to form a single feature vector and are fed into a support vector machine for classification into human activities. For evaluation purposes, a challenging motion activity recognition dataset is used, while single-view, cross-view and cross-subject experiments are performed. Moreover, the proposed approach is compared to three other state-of-the-art methods, demonstrating superior performance in most experiments.
科研通智能强力驱动
Strongly Powered by AbleSci AI