击键记录
计算机科学
编码(社会科学)
击键动态学
人工智能
集成学习
机器学习
集合预报
语音识别
自然语言处理
统计
计算机安全
数学
密码
S/键
作者
Muhammad Fawad Khan,John Edwards,Paul Bodily,Hamid Reza Karimi
标识
DOI:10.1109/bigdata59044.2023.10386085
摘要
Keystroke data in programming reveals intricate patterns that reflect the behavior of programmers. These patterns hold promise for predicting grades and other applications, providing insights into the skills of both proficient and less proficient programmers. Analyzing these patterns can yield tailored feedback for students who need support, enabling effective interventions. Our study utilizes a keystroke dataset from the CS1 (Introduction to Computer Science) course at Utah State University. We developed novel features by combining elements like key presses, timestamps, source locations, and programming terminology, drawing on prior research, our insights, and an analysis of programming behavior. An ensemble-based feature selection method identifies key features, which are then used in hyperparameter optimization and grade prediction with six classification and three regression algorithms. We categorized grades into three levels: Low, Average, and High. Despite challenges such as class imbalance, plagiarism, limited data per assignment, and the ceiling effect, we attained a notable weighted F1 score of 78%. We also introduce an ensemble classification strategy, merging Isolation Forest outlier detection with a refined Random Forest classifier, achieving 80% accuracy on our test set. Additionally, we provide a detailed interpretation of our features, supported by results and a case study of our dataset. This research aims to enhance computer science education at the undergraduate level, focusing on improving its overall quality. Code and data are available https://github.com/DSAatUSU/Student-Coding-Behavior.git.
科研通智能强力驱动
Strongly Powered by AbleSci AI