计算机科学
延迟(音频)
语音识别
话筒
字错误率
卷积(计算机科学)
帧(网络)
人工神经网络
任务(项目管理)
帧速率
人工智能
电信
声压
经济
管理
作者
Vijayaditya Peddinti,Yiming Wang,Daniel Povey,Sanjeev Khudanpur
出处
期刊:IEEE Signal Processing Letters
[Institute of Electrical and Electronics Engineers]
日期:2017-07-04
卷期号:25 (3): 373-377
被引量:179
标识
DOI:10.1109/lsp.2017.2723507
摘要
Bidirectional long short-term memory (BLSTM) acoustic models provide a significant word error rate reduction compared to their unidirectional counterpart, as they model both the past and future temporal contexts. However, it is nontrivial to deploy bidirectional acoustic models for online speech recognition due to an increase in latency. In this letter, we propose the use of temporal convolution, in the form of time-delay neural network (TDNN) layers, along with unidirectional LSTM layers to limit the latency to 200 ms. This architecture has been shown to outperform the state-of-the-art low frame rate (LFR) BLSTM models. We further improve these LFR BLSTM acoustic models by operating them at higher frame rates at lower layers and show that the proposed model performs similar to these mixed frame rate BLSTMs. We present results on the Switchboard 300 h LVCSR task and the AMI LVCSR task, in the three microphone conditions.
科研通智能强力驱动
Strongly Powered by AbleSci AI