典型相关
计算机科学
人工智能
特征(语言学)
相关性
模式识别(心理学)
深度学习
语音识别
特征提取
数学
几何学
语言学
哲学
作者
Ke Zhang,Yuanqing Li,Jingyu Wang,Zhen Wang,Xuelong Li
出处
期刊:IEEE Signal Processing Letters
[Institute of Electrical and Electronics Engineers]
日期:2021-01-01
卷期号:28: 1898-1902
被引量:16
标识
DOI:10.1109/lsp.2021.3112314
摘要
Fusion of multimodal features is a momentous problem for video emotion recognition. As the development of deep learning, directly fusing feature matrixes of each mode through neural networks at feature level becomes mainstream method. However, unlike unimodal issues, for multimodal analysis, finding the correlations between different modal is as important as discovering effective unimodal features. To make up the deficiency in unearthing the intrinsic relationships between multimodal, a novel modularized multimodal emotion recognition model based on deep canonical correlation analysis (MERDCCA) is proposed in this letter. In MERDCCA, four utterances are gathered as a new group and each utterance contains text, audio and visual information as multimodal input. Gated recurrent unit layers are used to extract the unimodal features. Deep canonical correlation analysis based on encoder-decoder network is designed to extract cross-modal correlations by maximizing the relevance between multimodal. The experiments on two public datasets show that MERDCCA achieves the better results.
科研通智能强力驱动
Strongly Powered by AbleSci AI