人格
感知
叙述的
计算机科学
多媒体
质量(理念)
教学设计
人机交互
心理学
语言学
认识论
哲学
神经科学
作者
Bin Jing,Changcheng Wu,Zhongling Pi,Yu Zhou,Yuxi Zhang,Hongchao Liu
标识
DOI:10.1080/00220973.2024.2446169
摘要
The rapid development of artificial intelligence technology has significantly improved the quality of computer-synthesized voices in modern text-to-speech (TTS) engines. Various appealing attributes can be added to these synthesized voices to support their widespread use in instructional videos. However, whether such synthesized voices can replace high-quality human-recorded voices remains uncertain. We conducted an eye-tracking experiment to examine the learning outcomes of instructional videos. We compared differences in learning performance, attentional engagement, and persona perceptions between a human-recorded voice and two computer-synthesized voices (formal and cute) generated by a modern TTS engine. Thirty university students participated in this study, with their eye movements recorded and analyzed as they watched instructional videos featuring different forms of narration. Overall, no statistically significant differences were found in persona perceptions between participants who learned from the human-recorded voice and those who learned from the two synthesized voices. However, the human-recorded voice significantly improved learning performance and attentional engagement. Our results indicate that while the quality of software-generated voices has reached a relatively high level of perception, it does not positively influence learning performance and attention. Therefore, we recommend that instructional video designers prioritize human-recorded voices over software-synthesized voices.
科研通智能强力驱动
Strongly Powered by AbleSci AI