自编码
解码方法
计算机科学
情态动词
多模态
人工智能
语音识别
模式识别(心理学)
算法
人工神经网络
万维网
化学
高分子化学
作者
Cheng Cheng,Wenzhe Liu,Yong Zhang,Lin Feng,Ziyu Jia
标识
DOI:10.1109/tcss.2024.3415613
摘要
Multimodal emotion recognition (MER) has recently gained much attention since it can leverage information over multiple modalities. However, in real life, we often encounter the problem of missing modalities, as well as modeling the heterogeneity and correlation among multimodal data are challenges. To this end, we propose a unified model called cross-modal adaptive masked autoencoder (CMA-MAE) for incomplete multimodal learning. Our CMA-MAE model comprises a cross-modal adaptive fusion encoder (CMAFE) and a multiview adaptive encoder (MVAE) to capture and fuse the heterogeneity and correlation among multimodal features. Additionally, we design a convolutional decoder that progressive upsampling and fusion with the modality-invariant features to generate robust emotional features from partially observable data. To effectively utilize both data with complete and incomplete modalities for feature learning, we adopt an end-to-end approach that simultaneously optimizes classification and reconstruction tasks. Extensive testing on the DEAP and SEED-IV datasets is conducted to assess our model, with the findings demonstrating that our CMA-MAE model outperforms current leading approaches in both incomplete and complete multimodal learning scenarios.
科研通智能强力驱动
Strongly Powered by AbleSci AI