计算机科学
模式
多模式学习
深度学习
情感计算
人工智能
多模态
特征学习
万维网
社会科学
社会学
作者
Shiqing Zhang,Yijiao Yang,Chen Chen,Xingnan Zhang,Qingming Leng,Xiaoming Zhao
标识
DOI:10.1016/j.eswa.2023.121692
摘要
Emotion recognition has recently attracted extensive interest due to its significant applications to human–computer interaction. The expression of human emotion depends on various verbal and non-verbal languages like audio, visual, text, etc. Emotion recognition is thus well suited as a multimodal rather than single-modal learning problem. Owing to the powerful feature learning capability, extensive deep learning methods have been recently leveraged to capture high-level emotional feature representations for multimodal emotion recognition (MER). Therefore, this paper makes the first effort in comprehensively summarize recent advances in deep learning-based multimodal emotion recognition (DL-MER) involved in audio, visual, and text modalities. We focus on: (1) MER milestones are given to summarize the development tendency of MER, and conventional multimodal emotional datasets are provided; (2) The core principles of typical deep learning models and its recent advancements are overviewed; (3) A systematic survey and taxonomy is provided to cover the state-of-the-art methods related to two key steps in a MER system, including feature extraction and multimodal information fusion; (4) The research challenges and open issues in this field are discussed, and promising future directions are given.
科研通智能强力驱动
Strongly Powered by AbleSci AI