Pengcheng Huang,Li Li,Chunyan Wu,Xiaoqian Zhang,Z. Y. Liu
标识
DOI:10.1145/3603781.3603871
摘要
Automated essay scoring systems are widely used in education, and essay off-topic detection is an integral part of this. Traditionally off-topic essay detection is based on text features represented as spatial vectors, however, this approach only addresses the structure of essay statements and requires the use of manual features. This paper proposed to use the Sentence-BERT model to detect off-topic essays, the method first obtains a large amount of high-quality data to build a corpus of off-topic essays, and two Siamese twin pre-trained models are used to embed sentences in the essay topic, and the body of the essay, generate semantically rich sentence vectors and then use cosine similarity to calculate the similarity between the topic and the body of the essay after averaging the pooled sentence vectors, and select the optimal threshold to determine off-topic essays through continuous training. The experimental results show that the proposed method improves the accuracy, recall, and F1 values by 9.5%, 11.2%, and 10.4% respectively over the C-BGRU (Convolutional-Bidirectional Gate Recurrent Unit) based Siamese twin network and also has an excellent performance in topics with different degrees of divergence.