短语
计算机科学
语音识别
分割
人工智能
自然语言处理
特征提取
韵律
自编码
特征(语言学)
人工神经网络
语言学
哲学
作者
Jia-Hao Hsu,Chung-Hsien Wu,Tsung‐Hsien Yang
标识
DOI:10.23919/apsipaasc55919.2022.9980239
摘要
Speech emotion recognition has been an important field in the research of human-computer interaction. Understanding the user's emotions from speech help the system to grasp the user's underlying information, such as user satisfaction with the service. This research attempts to detect the emotion of the user's speech recorded by the customer service dialogue systems for telecommunication applications. This study proposes the prosodic phrase-based Vector Quantized Variational AutoEncoder (VQVAE) as the feature extraction module in the pre-trained model, Audio ALBERT (AALBERT). Two steps are added before fine-tuning the pre-trained AALBERT model, including prosodic phrase segmentation and prosodic phrase-based VQVAE model. The speech segments are extracted using the prosodic phrase segmentation algorithm, in which each segment is supposed to contain only a single emotion. The VQVAE model is trained to obtain quantized important prosodic phrase vectors. In the experiment, the speech corpus collected by the telecom customer service system was used for evaluation, and the ablation study shows that the method proposed can effectively improve the performance of the pretrained model, and the accuracy reached 91.41%. It can be seen that feature extraction using prosodic segmentation and prosodic phrase quantization has a certain potential in the field of speech emotion recognition.
科研通智能强力驱动
Strongly Powered by AbleSci AI