计算机科学
图形
语音识别
情绪识别
人工智能
自然语言处理
理论计算机科学
作者
Ang Chen,Rongqing Huang,Xin Tong,Liang Wu,Wangdui Bianba
摘要
Speech Emotion Recognition (SER) is a crucial component in the field of Human-Computer Interaction (HCI), with significant research and practical application implications. However, due to the complexity of the Tibetan language and the scarcity of datasets caused by the difficulty in collecting various dialects, there are not many research achievements in Tibetan speech recognition. Based on the foundation of constructing a TBLS1 dataset containing 6,000 Tibetan-language speech samples, an approach was devised for Tibetan speech emotion recognition. This approach leverages MFCC features and incorporates a Bi-directional Long Short-Term Memory (Bi-LSTM) network within a graph convolutional neural network. Finally, by comparing the performance of different models on this dataset, we demonstrated the feasibility of our model for Tibetan speech emotion recognition.
科研通智能强力驱动
Strongly Powered by AbleSci AI