Incongruity-aware multimodal physiology signals fusion for emotion recognition

串联（数学）模态（人机交互）计算机科学模式冗余（工程）人工智能特征（语言学）模式识别（心理学）融合变压器语音识别数学工程类社会科学语言学哲学组合数学电压社会学电气工程操作系统

作者

Jing Li,Ning Chen,Hongqing Zhu,Guangqiang Li,Zhangyong Xu,Dingxin Chen

出处

期刊：Information Fusion [Elsevier]
日期：2023-12-29 卷期号：105: 102220-102220 被引量：3

标识

DOI：10.1016/j.inffus.2023.102220

摘要

Various physiological signals can reflect the human’s emotional states objectively. How to take advantage of the common as well as complementary properties of different physiological signals in representing the emotional states is an interesting problem. Although various models have been constructed to fuse multimodal physiological signals for emotion recognition, the possible incongruity existing among different physiological signals in representing the emotional states and the redundancy resulted from the fusion, which may affect the performance of the fusion schemes seriously, were seldom considered. To this end, a fusion model, which can eliminate the incongruity among different physiological signals and reduce the information redundancy to some extent, is proposed. First, one physiological signal is chosen as the primary modality due to its prominent performance in emotion recognition, and the remaining physiological signals are viewed as the auxiliary modalities. Secondly, the Cross Modal Transformer (CMT) is adopted to optimize the features of the auxiliary modalities by eliminating the incongruity among them, and then Low Rank Fusion (LRF) is performed to eliminate information redundancy caused by fusion. Thirdly, the modified CMT (MCMT) is constructed to enhance the feature of the primary modality by that of each optimized auxiliary modality feature. Fourthly, Self-Attention Transformer (SAT) is performed on the concatenation result of all the enhanced primary modality features to take full advantage of the common as well complementary properties among them in representing the emotional states. Finally, the enhanced primary modality feature and the optimized auxiliary features are fused by concatenation for emotion recognition. Extensive experimental results on DEAP and WESAD datasets demonstrate that i) The incongruity does exist among different physiological signals, and the CMT-based auxiliary modality feature optimization strategy can eliminate the incongruity prominently; ii) The emotion prediction accuracy of the primary modality can be enhanced by the auxiliary modality; iii) All the key modules in the proposed model, CMT, LRF, and MCMT, contribute to the performance enhancement of the proposed model; iv) The proposed model outperforms State-Of-The-Art (SOTA) models in emotion recognition task.

求助该文献

最长约 10秒，即可获得该文献文件

Incongruity-aware multimodal physiology signals fusion for emotion recognition

今日热心研友