刻度(仪器)
课程
集合(抽象数据类型)
计算机科学
数学教育
嵌入
科学与工程
人工智能
数学
工程类
程序设计语言
教育学
心理学
几何学
工程伦理学
作者
Sarah Zhang,Samuel Florin,Ariel N. Lee,Eamon Niknafs,Andrei Marginean,Annie Wang,Keith Tyser,Zad Chin,Yann Hicke,Nikhil Singh,Madeleine Udell,Yoon Kim,Tonio Buonassisi,Armando Solar-Lezama,Iddo Drori
出处
期刊:Cornell University - arXiv
日期:2023-01-01
被引量:11
标识
DOI:10.48550/arxiv.2306.08997
摘要
We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set excluding questions based on images. We fine-tune an open-source large language model on this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed performance breakdown by course, question, and answer type. By embedding questions in a low-dimensional space, we explore the relationships between questions, topics, and classes and discover which questions and classes are required for solving other questions and classes through few-shot learning. Our analysis offers valuable insights into course prerequisites and curriculum design, highlighting language models' potential for learning and improving Mathematics and EECS education.
科研通智能强力驱动
Strongly Powered by AbleSci AI