Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

判别式正规化（语言学）人工智能计算机科学语音识别自然语言处理模式识别（心理学）

作者

Yan Zhao,Yuan Zong,Jincen Wang,Hailun Lian,Cheng Lu,Li Zhao,Wenming Zheng

出处

期刊：Cornell University - arXiv 日期：2023-01-01

链接

arxiv.org datacite.orgdoi.org

标识

DOI：10.48550/arxiv.2310.03992

摘要

In this paper, we propose a new unsupervised domain adaptation (DA) method called layer-adapted implicit distribution alignment networks (LIDAN) to address the challenge of cross-corpus speech emotion recognition (SER). LIDAN extends our previous ICASSP work, deep implicit distribution alignment networks (DIDAN), whose key contribution lies in the introduction of a novel regularization term called implicit distribution alignment (IDA). This term allows DIDAN trained on source (training) speech samples to remain applicable to predicting emotion labels for target (testing) speech samples, regardless of corpus variance in cross-corpus SER. To further enhance this method, we extend IDA to layer-adapted IDA (LIDA), resulting in LIDAN. This layer-adpated extention consists of three modified IDA terms that consider emotion labels at different levels of granularity. These terms are strategically arranged within different fully connected layers in LIDAN, aligning with the increasing emotion-discriminative abilities with respect to the layer depth. This arrangement enables LIDAN to more effectively learn emotion-discriminative and corpus-invariant features for SER across various corpora compared to DIDAN. It is also worthy to mention that unlike most existing methods that rely on estimating statistical moments to describe pre-assumed explicit distributions, both IDA and LIDA take a different approach. They utilize an idea of target sample reconstruction to directly bridge the feature distribution gap without making assumptions about their distribution type. As a result, DIDAN and LIDAN can be viewed as implicit cross-corpus SER methods. To evaluate LIDAN, we conducted extensive cross-corpus SER experiments on EmoDB, eNTERFACE, and CASIA corpora. The experimental results demonstrate that LIDAN surpasses recent state-of-the-art explicit unsupervised DA methods in tackling cross-corpus SER tasks.

求助该文献

最长约 10秒，即可获得该文献文件

Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

今日热心研友