计算机科学
人工智能
机器学习
情态动词
模棱两可
杠杆(统计)
水准点(测量)
自动汇总
情绪识别
监督学习
自然语言处理
人工神经网络
化学
高分子化学
程序设计语言
地理
大地测量学
作者
Liping Zheng,Bin Liu,Jianhua Tao
出处
期刊:IEEE Transactions on Affective Computing
[Institute of Electrical and Electronics Engineers]
日期:2023-07-01
卷期号:14 (3): 2415-2429
被引量:11
标识
DOI:10.1109/taffc.2022.3141237
摘要
Conversational emotion recognition is a crucial research topic in human-computer interactions. Due to the heavy annotation cost and inevitable label ambiguity, collecting large amounts of labeled data is challenging and expensive, which restricts the performance of current fully-supervised methods in this domain. To address this problem, researchers attempt to distill knowledge from unlabeled data via semi-supervised learning. However, most of these semi-supervised methods ignore multimodal interactive information, although recent works have proven that such interactive information is essential for emotion recognition. To this end, we propose a novel framework to seamlessly integrate semi-supervised learning with multimodal interactions, called “Semi-supervised Multi-modal Interaction Network (SMIN)”. SMIN contains two well-designed semi-supervised modules, “Intra-modal Interactive Module (IIM)” and “Cross-modal Interactive Module (CIM)” to learn intra- and cross-modal interactions. These two modules leverage additional unlabeled data to extract emotion-salient representations. To capture additional contextual information, we utilize the hierarchical recurrent networks followed with the hybrid fusion strategy to integrate multimodal features. These multimodal features are further utilized for conversational emotion recognition. Experimental results on four benchmark datasets (i.e., IEMOCAP, MELD, CMU-MOSI and CMU-MOSEI) demonstrate that SMIN succeeds over existing state-of-the-art strategies on emotion recognition.
科研通智能强力驱动
Strongly Powered by AbleSci AI