光谱图
计算机科学
最佳显著性理论
人工智能
特征(语言学)
语音识别
深度学习
任务(项目管理)
卷积神经网络
代表(政治)
多任务学习
特征学习
情绪分类
卷积(计算机科学)
模式识别(心理学)
人工神经网络
心理学
工程类
政治
心理治疗师
系统工程
法学
哲学
语言学
政治学
作者
Kishor Bhangale,Mohanaprasad Kothandaraman
标识
DOI:10.1016/j.apacoust.2023.109613
摘要
Emotions are very crucial for humans for expressing perception and daily activities such as communication, learning, and decision-making. Human emotion recognition using machines is a very complex task. Recently deep learning techniques have been widely used to automate this task by providing machines with a huge learning capability. However, Speech emotion recognition (SER) is challenging due to language, regional, gender, age, and cultural variations. Most of the previous SER techniques have used only one type of feature representation to train deep learning algorithms, which limits the performance of SER. This paper presents a novel Parallel Emotion Network (PEmoNet) that includes Deep Convolution Neural Network (DCNN) with three parallel arms to address effective SER. The three parallel arms of the proposed PEmoNet accept the Multitaper Mel Frequency Spectrogram (MTMFS), Gammatonegram spectrogram (GS), and Constant Q-Transform Spectrogram (CQTS) as input to improve the feature distinctiveness of the emotion signal. The performance of the proposed SER scheme is evaluated on EMODB and RAVDESS datasets based on accuracy, recall, precision, and F1-score. The proposed technique shows 97.14% and 97.41% accuracy for the EMODB and RAVDESS datasets. It shows that the proposed PEmoNet with different spectral representation inputs helps improve the emotions' distinctiveness and outperforms the existing state of the arts.
科研通智能强力驱动
Strongly Powered by AbleSci AI