A Spatial–Temporal Graph Model for Pronunciation Feature Prediction of Chinese Poetry

计算机科学发音人工智能语音识别 Mel倒谱阅读（过程）图形编码器特征（语言学）自然语言处理模式识别（心理学）特征提取语言学哲学理论计算机科学操作系统

作者

Qing Wang,Weiping Liu,Wang Xiu,Xinghong Chen,Guannan Chen,Qingxiang Wu

出处

期刊：IEEE transactions on neural networks and learning systems [Institute of Electrical and Electronics Engineers]
日期：2023-12-01 卷期号：34 (12): 10294-10308 被引量：12

链接

nih.govdoi.org

标识

DOI：10.1109/tnnls.2022.3165554

摘要

With the development of artificial intelligence, speech recognition and prediction have become one of the important research domains with wild applications, such as intelligent control, education, individual identification, and emotion analysis. Chinese poetry reading contains rich features of continuous pronunciations, such as mood, emotion, rhythm schemes, lyric reading, and artistic expression. Therefore, the prediction of the pronunciation characteristics of a Chinese poetry reading is the significance for the presentation of high-level machine intelligence and has the potential to create a high-level intelligent system for teaching children to read Tang poetry. Mel frequency cepstral coefficient (MFCC) is currently used to present important speech features. Due to the complexity and high degree of nonlinearity in poetry reading, however, there is a tough challenge facing accurate pronunciation feature prediction, that is, how to model complex spatial correlations and time dynamics, such as rhyme schemes. As for many current methods, they ignore the spatial and temporal characteristics in MFCC presentation. In addition, these methods are subjected to certain limitations on prediction for long-term performance. In order to solve these problems, we propose a novel spatial-temporal graph model (STGM-MHA) based on multihead attention for the purpose of pronunciation feature prediction of Chinese poetry. The STGM-MHA is designed using an encoder-decoder structure. The encoder compresses the data into a hidden space representation, while the decoder reconstructs the hidden space representation as output. In the model, a novel gated recurrent unit (GRU) module (AGRU) based on multihead attention is proposed to extract the spatial and temporal features of MFCC data effectively. The evaluation comparison of our proposed model versus state-of-the-art methods in six datasets reveals the clear advantage of the proposed model.

求助该文献

最长约 10秒，即可获得该文献文件

A Spatial–Temporal Graph Model for Pronunciation Feature Prediction of Chinese Poetry

今日热心研友