面部表情
计算机科学
情绪识别
人工智能
模式
语音识别
模式识别(心理学)
情感计算
计算机视觉
社会科学
社会学
作者
Yassine Ouzar,Frédéric Bousefsaf,Djamaleddine Djeldjli,Choubeila Maaoui
标识
DOI:10.1109/cvprw56347.2022.00275
摘要
Human’s affective state recognition remains a challenging topic due to the complexity of emotions, which involves experiential, behavioral, and physiological elements. Since it is difficult to comprehensively describe emotion in terms of single modalities, recent studies have focused on fusion strategy to exploit the complementarity of multimodal signals. In this article, we study the feasibility of fusing facial expressions with physiological cues on human emotion recognition accuracy. The contributions of this work are threefold: 1) We propose a new spatiotemporal network for facial expression recognition using a 3D squeeze and exitation based 3D Xception architecture (squeeze and exitation Xception network). 2) We adopt the first multiple modalities fusion using single input source which, to the best of our knowledge, no existing multimodal emotion recognition system has attempted to identify emotional state from only facial videos using facial expressions and physiological signals features. 3) We compare the performance of the uni-modal approach using only facial expressions or physiological data, to multimodal systems fusing facial expressions with video-based physiological cues. In our experiments, physiological signals such as the iPPG signal and features of heart rate variability measured remotely using the imaging photoplethysmography (iPPG) method are used. The preliminary results show that the multimodal fusion model improves the accuracy of emotion recognition, and merging facial expressions features with iPPG signal gives the best accuracy with 71.90 %.
科研通智能强力驱动
Strongly Powered by AbleSci AI