计算机科学
光谱图
人工智能
特征提取
人工神经网络
模式识别(心理学)
语音识别
深度学习
主成分分析
渲染(计算机图形)
特征(语言学)
情绪识别
机器学习
语言学
哲学
作者
Apeksha Aggarwal,Akshat Srivastava,Ajay Agarwal,Nidhi Chahal,Dilbag Singh,Abeer Ali Alnuaim,Aseel Alhadlaq,Heung-No Lee
出处
期刊:Sensors
[Multidisciplinary Digital Publishing Institute]
日期:2022-03-19
卷期号:22 (6): 2378-2378
被引量:37
摘要
Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN.
科研通智能强力驱动
Strongly Powered by AbleSci AI