计算机科学
质量(理念)
自然语言生成
主题专家
人工智能
语言模型
航程(航空)
自然语言
机器学习
数据科学
专家系统
认识论
哲学
复合材料
材料科学
作者
Zichao Wang,Jakob Valdez,Debshila Basu Mallick,Richard G. Baraniuk
标识
DOI:10.1007/978-3-031-11644-5_13
摘要
We investigate the utility of large pretrained language models (PLMs) for automatic educational assessment question generation. While PLMs have shown increasing promise in a wide range of natural language applications, including question generation, they can generate unreliable and undesirable content. For high-stakes applications such as educational assessments, it is not only critical to ensure that the generated content is of high quality but also relates to the specific content being assessed. In this paper, we investigate the impact of various PLM prompting strategies on the quality of generated questions. We design a series of generation scenarios to evaluate various generation strategies and evaluate generated questions via automatic metrics and manual examination. With empirical evaluation, we identify the prompting strategy that is most likely to lead to high-quality generated questions. Finally, we demonstrate the promising educational utility of generated questions using our concluded best generation strategy by presenting generated questions together with human-authored questions to a subject matter expert, who despite their expertise, could not effectively distinguish between generated and human-authored questions.
科研通智能强力驱动
Strongly Powered by AbleSci AI