鉴别器
计算机科学
语音识别
情绪识别
不变(物理)
特征(语言学)
特征学习
模式识别(心理学)
特征提取
人工智能
领域(数学分析)
数学
语言学
电信
数学分析
哲学
探测器
数学物理
作者
Cheng Lu,Yuan Zong,Wenming Zheng,Yang Li,Chuangao Tang,Björn Schüller
出处
期刊:IEEE/ACM transactions on audio, speech, and language processing
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:30: 2217-2230
被引量:27
标识
DOI:10.1109/taslp.2022.3178232
摘要
In this paper, we propose a novel domain invariant feature learning (DIFL) method to deal with speaker-independent speech emotion recognition (SER). The basic idea of DIFL is to learn the speaker-invariant emotion feature by eliminating domain shifts between the training and testing data caused by different speakers from the perspective of multi-source unsupervised domain adaptation (UDA). Specifically, we embed a hierarchical alignment layer with the strong-weak distribution alignment strategy into the feature extraction block to firstly reduce the discrepancy in feature distributions of speech samples across different speakers as much as possible. Furthermore, multiple discriminators in the discriminator block are utilized to confuse the speaker information of emotion features both inside the training data and between the training and testing data. Through them, a multi-domain invariant representation of emotional speech can be gradually and adaptively achieved by updating network parameters. We conduct extensive experiments on three public datasets, i. e., Emo-DB, eNTERFACE, and CASIA, to evaluate the SER performance of the proposed method, respectively. The experimental results show that the proposed method is superior to the state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI