重点(电信)
计算机科学
副语言
话语
自然性
语音识别
抄写(语言学)
自然语言处理
条件随机场
人工智能
语音合成
语言学
量子力学
电信
物理
哲学
作者
Quoc Truong,Tomoki Toda,Graham Neubig,Sakriani Sakti,Satoshi Nakamura
出处
期刊:IEEE/ACM transactions on audio, speech, and language processing
[Institute of Electrical and Electronics Engineers]
日期:2017-03-01
卷期号:25 (3): 544-556
被引量:19
标识
DOI:10.1109/taslp.2016.2643280
摘要
Speech-to-speech translation (S2ST) is a technology that translates speech across languages, which can remove barriers in cross-lingual communication. In the conventional S2ST systems, the linguistic meaning of speech was translated, but paralinguistic information conveying other features of the speech such as emotion or emphasis were ignored. In this paper, we propose a method to translate paralinguistic information, specifically focusing on emphasis. The method consists of a series of components that can accurately translate emphasis using all acoustic features of speech. First, linear-regression hidden semi-Markov models (LRHSMMs) are used to estimate a real-numbered emphasis value for every word in an utterance, resulting in a sequence of values for the utterance. After that the emphasis translation module translates the estimated emphasis sequence into a target language emphasis sequence using a conditional random field model considering the features of emphasis levels, words, and part-of-speech tags. Finally, the speech synthesis module synthesizes emphasized speech with LR-HSMMs, taking into account the translated emphasis sequence and transcription. The results indicate that our translation model can translate emphasis information, correctly emphasizing words in the target language with 91.6% F-measure by objective evaluation. A listening test with human subjects further showed that they could identify the emphasized words with 87.8% F-measure, and that the naturalness of the audio was preserved.
科研通智能强力驱动
Strongly Powered by AbleSci AI