The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education

医学 涡轮 推论 兵役 内科学 人工智能 计算机科学 工程类 考古 汽车工程 历史
作者
Michael G. Rizzo,Nathan Cai,David S. Constantinescu
出处
期刊:Journal of orthopaedics [Elsevier]
卷期号:50: 70-75 被引量:28
标识
DOI:10.1016/j.jor.2023.11.056
摘要

The rapid advancement of artificial intelligence (AI), particularly the development of Large Language Models (LLMs) such as Generative Pretrained Transformers (GPTs), has revolutionized numerous fields. The purpose of this study is to investigate the application of LLMs within the realm of orthopaedic in training examinations. Questions from the 2020–2022 Orthopaedic In-Service Training Exams (OITEs) were given to OpenAI's GPT-3.5 Turbo and GPT-4 LLMs, using a zero-shot inference approach. Each model was given a multiple-choice question, without prior exposure to similar queries, and their generated responses were compared to the correct answer within each OITE. The models were evaluated on overall accuracy, performance on questions with and without media, and performance on first- and higher-order questions. The GPT-4 model outperformed the GPT-3.5 Turbo model across all years and question categories (2022: 67.63% vs. 50.24%; 2021: 58.69% vs. 47.42%; 2020: 59.53% vs. 46.51%). Both models showcased better performance with questions devoid of associated media, with GPT-4 attaining accuracies of 68.80%, 65.14%, and 68.22% for 2022, 2021, and 2020, respectively. GPT-4 outscored GPT-3.5 Turbo on first-order questions across all years (2022: 63.83% vs. 38.30%; 2021: 57.45% vs. 50.00%; 2020: 65.74% vs. 53.70%). GPT-4 also outscored GPT-3.5 Turbo on higher-order questions across all years (2022: 68.75% vs. 53.75%; 2021: 59.66% vs. 45.38%; 2020: 53.27% vs. 39.25%). GPT-4 showed improved performance compared to GPT-3.5 Turbo in all tested categories. The results reflect the potential and limitations of AI in orthopaedics. GPT-4's performance is comparable to a second-to-third-year resident and GPT-3.5 Turbo's performance is comparable to a first-year resident, suggesting the application of current LLMs can neither pass the OITE nor substitute orthopaedic training. This study sets a precedent for future endeavors integrating GPT models into orthopaedic education and underlines the necessity for specialized training of these models for specific medical domains.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
万能图书馆应助白云朵儿采纳,获得10
3秒前
Mera发布了新的文献求助10
4秒前
汉堡包应助小土块oo采纳,获得10
5秒前
6秒前
wzwz发布了新的文献求助10
9秒前
pyrene完成签到 ,获得积分10
10秒前
10秒前
10秒前
11秒前
细心若南完成签到,获得积分10
11秒前
12秒前
纪梵希发布了新的文献求助10
13秒前
香蕉觅云应助meng采纳,获得10
15秒前
15秒前
白云朵儿发布了新的文献求助10
15秒前
15秒前
16秒前
yyf完成签到,获得积分10
17秒前
桐桐应助LL采纳,获得10
17秒前
18秒前
18秒前
after_17发布了新的文献求助10
20秒前
纪梵希完成签到,获得积分10
21秒前
kc发布了新的文献求助10
23秒前
Hou完成签到 ,获得积分10
23秒前
24秒前
24秒前
852应助北望采纳,获得10
24秒前
吴青应助常葶采纳,获得10
26秒前
28秒前
领导范儿应助kc采纳,获得10
29秒前
热情的豁完成签到 ,获得积分10
30秒前
11111完成签到 ,获得积分10
31秒前
31秒前
future完成签到 ,获得积分10
32秒前
32秒前
1391451653完成签到,获得积分10
33秒前
顾矜应助大力的无声采纳,获得10
35秒前
LL发布了新的文献求助10
36秒前
高分求助中
Continuum Thermodynamics and Material Modelling 4000
Production Logging: Theoretical and Interpretive Elements 2700
Ensartinib (Ensacove) for Non-Small Cell Lung Cancer 1000
Les Mantodea de Guyane Insecta, Polyneoptera 1000
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
El viaje de una vida: Memorias de María Lecea 800
Luis Lacasa - Sobre esto y aquello 700
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3524986
求助须知:如何正确求助?哪些是违规求助? 3105799
关于积分的说明 9276348
捐赠科研通 2803105
什么是DOI,文献DOI怎么找? 1538346
邀请新用户注册赠送积分活动 716206
科研通“疑难数据库(出版商)”最低求助积分说明 709296