语音识别
计算机科学
鉴别器
分类器(UML)
情绪识别
嵌入
平滑的
自然语言处理
相似性(几何)
人工智能
电信
探测器
图像(数学)
计算机视觉
作者
Yinru He,Guihua Wen,Pei Yang,Dongliang Chen
标识
DOI:10.1109/icassp48485.2024.10446440
摘要
Cross-Corpus Speech Emotion Recognition (SER) aims to identify human emotions from speech across different speakers and languages. Previous work engaged in extracting the domain-invariant features among individual samples that are most relevant to emotions, ignoring rich relationships between speech instances, which are also significant factors that strongly influence the sentiments. To explore those potential relationships across multiple corpora, we introduce a novel cross-corpus SER architecture with speech relationship learning. Specifically, during training, we employ the attention mechanism on the entire input batch, embedding the sample-level similar features in emotion space into new representations. Furthermore, a dual discriminator structure is proposed for improving the similarity calculation performance through adversarial training, and a domain-wise shared classifier with batch label smoothing strategy is proposed to enhance the network generalization ability. Experiments on the CASIA, EMODB and SAVEE datasets have demonstrated that the proposed method outperforms the state-of-the-art cross-corpus SER methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI