自回归模型
光谱图
计算机科学
语音识别
流量(数学)
推论
人工神经网络
语音合成
人工智能
数学
统计
几何学
作者
Mengyu Cao,Shuang Liang,Minchuan Chen,Jun Ma,Shaojun Wang,Jing Xiao
标识
DOI:10.1109/icassp40776.2020.9054484
摘要
In this work, we propose Flow-TTS, a non-autoregressive end-to-end neural TTS model based on generative flow. Unlike other non-autoregressive models, Flow-TTS can achieve high-quality speech generation by using a single feed-forward network. To our knowledge, Flow-TTS is the first TTS model utilizing flow in spectrogram generation network and the first non-autoregssive model which jointly learns the alignment and spectrogram generation through a single network. Experiments on LJSpeech show that the speech quality of Flow-TTS heavily approaches that of human and is even better than that of autoregressive model Tacotron 2 (outperforms Tacotron 2 with a gap of 0.09 in MOS). Meanwhile, the inference speed of Flow-TTS is about 23 times speed-up over Tacotron 2, which is comparable to FastSpeech. 1
科研通智能强力驱动
Strongly Powered by AbleSci AI