语音识别
Mel倒谱
计算机科学
情绪识别
人工智能
模式识别(心理学)
特征提取
作者
T. Mary Little Flower,Thirasama Jaya,S. Christopher Ezhil Singh
出处
期刊:Automatika
[Informa]
日期:2024-07-03
卷期号:65 (4): 1325-1338
标识
DOI:10.1080/00051144.2024.2371249
摘要
Speech emotion recognition (SER) is attractive in several domains, such as automated translation, call centres, intelligent healthcare, and human–computer interaction. Deep learning models for emotion identification need considerable labelled data, which is only sometimes available in the SER industry. A database needs enough speech samples, good features, and a better classifier to identify emotions efficiently. This study uses data augmentation to enhance the amount of input voice samples and address the data shortage issue. The database capacity increases by adding white noise to the speech signals by data augmentation. In this work, the Mel-frequency Cepstral Coefficient (MFCC) and Mel-frequency Magnitude Coefficient (MFMC) features, along with a one-dimensional convolutional neural network (1D-CNN), are used to classify speech emotions. The datasets utilized to estimate the model's enactment were AESDD, CAFE, EmoDB, IEMOCAP, and MESD. The data augmentation with the 1D-CNN (MFMC) model performed best, with an average accuracy of 99.2% for AESDD, 99.5% for CAFE, 97.5% for EmoDB, 92.4% for IEMOCAP and 96.9% for the MESD database. The proposed 1D-CNN (MFMC) with data augmentation outperforms the 1D-CNN (MFCC) without data augmentation in emotion recognition.
科研通智能强力驱动
Strongly Powered by AbleSci AI