计算机科学
自然语言处理
开源
语言模型
人工智能
程序设计语言
软件
作者
Yijing Song,Qianta Zhu,Huaibo Wang,Qinhua Zheng
出处
期刊:IEEE Transactions on Learning Technologies
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-11
标识
DOI:10.1109/tlt.2024.3396873
摘要
Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack generalizability, which makes them hard to implement in daily teaching activities. Moreover, online sites offering AES and AER services charge high fees and have security issues uploading student content. In light of these challenges, and recognizing the advancements in large language models (LLMs), we aim to fill these research gaps by analyzing the performance of open-source LLMs when accomplishing AES and AER tasks. Using a human-scored essay dataset (n = 600) collected in an online assessment, we implemented zero-shot, few-shot, and p-tuning AES methods based on the LLMs and conducted a human-machine consistency check. We conducted a similarity test and a score difference test for the results of AER with LLMs support. The human-machine consistency check result shows that the performance of open-source LLMs with a 10B parameter size in the AES task is close to that of some deep learning baseline models, and it can be improved by integrating the comment with the score into the shot or training continuous prompts. The similarity test and score difference test results show that open-source LLMs can effectively accomplish the AER task, improving the quality of the essays while ensuring that the revision results are similar to the original essays. This study reveals a practical path to cost-effectively, time-efficiently, and content-safely assisting teachers with student essay scoring and revising using open-source LLMs.
科研通智能强力驱动
Strongly Powered by AbleSci AI