Performance of Large Language Models on Medical Oncology Examination Questions

临床肿瘤学 医学 危害 肿瘤科 内科学 临床试验 集合(抽象数据类型) 家庭医学 心理学 癌症 计算机科学 社会心理学 程序设计语言
作者
Jack B. Longwell,Ian Hirsch,Fernando Binder,Galileo Arturo Gonzalez Conchas,Daniel Mau,Raymond Jang,Rahul G. Krishnan,Robert C. Grant
出处
期刊:JAMA network open [American Medical Association]
卷期号:7 (6): e2417641-e2417641 被引量:5
标识
DOI:10.1001/jamanetworkopen.2024.17641
摘要

Importance Large language models (LLMs) recently developed an unprecedented ability to answer questions. Studies of LLMs from other fields may not generalize to medical oncology, a high-stakes clinical setting requiring rapid integration of new information. Objective To evaluate the accuracy and safety of LLM answers on medical oncology examination questions. Design, Setting, and Participants This cross-sectional study was conducted between May 28 and October 11, 2023. The American Society of Clinical Oncology (ASCO) Oncology Self-Assessment Series on ASCO Connection, the European Society of Medical Oncology (ESMO) Examination Trial questions, and an original set of board-style medical oncology multiple-choice questions were presented to 8 LLMs. Main Outcomes and Measures The primary outcome was the percentage of correct answers. Medical oncologists evaluated the explanations provided by the best LLM for accuracy, classified the types of errors, and estimated the likelihood and extent of potential clinical harm. Results Proprietary LLM 2 correctly answered 125 of 147 questions (85.0%; 95% CI, 78.2%-90.4%; P < .001 vs random answering). Proprietary LLM 2 outperformed an earlier version, proprietary LLM 1, which correctly answered 89 of 147 questions (60.5%; 95% CI, 52.2%-68.5%; P < .001), and the best open-source LLM, Mixtral-8x7B-v0.1, which correctly answered 87 of 147 questions (59.2%; 95% CI, 50.0%-66.4%; P < .001). The explanations provided by proprietary LLM 2 contained no or minor errors for 138 of 147 questions (93.9%; 95% CI, 88.7%-97.2%). Incorrect responses were most commonly associated with errors in information retrieval, particularly with recent publications, followed by erroneous reasoning and reading comprehension. If acted upon in clinical practice, 18 of 22 incorrect answers (81.8%; 95% CI, 59.7%-94.8%) would have a medium or high likelihood of moderate to severe harm. Conclusions and Relevance In this cross-sectional study of the performance of LLMs on medical oncology examination questions, the best LLM answered questions with remarkable performance, although errors raised safety concerns. These results demonstrated an opportunity to develop and evaluate LLMs to improve health care clinician experiences and patient care, considering the potential impact on capabilities and safety.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
1秒前
wuy完成签到,获得积分10
4秒前
小班发布了新的文献求助10
5秒前
Dee完成签到,获得积分20
8秒前
爱吃粑粑完成签到,获得积分10
9秒前
9秒前
9秒前
小蘑菇应助十七采纳,获得10
9秒前
10秒前
和尚哥完成签到,获得积分10
11秒前
转圈圈完成签到,获得积分10
12秒前
自觉香旋应助scoredemon采纳,获得10
13秒前
现代期待发布了新的文献求助10
14秒前
15秒前
赘婿应助转圈圈采纳,获得10
17秒前
淡定的萝莉完成签到 ,获得积分10
17秒前
有一朵小玫瑰完成签到 ,获得积分10
18秒前
科研通AI2S应助任性雨筠采纳,获得10
19秒前
21秒前
zhangpeipei完成签到,获得积分10
23秒前
辛子发布了新的文献求助10
23秒前
OutMan完成签到,获得积分10
23秒前
樊丽彤发布了新的文献求助30
23秒前
25秒前
青年才俊发布了新的文献求助10
27秒前
28秒前
852应助xumq采纳,获得10
29秒前
29秒前
29秒前
29秒前
30秒前
香蕉觅云应助玉米采纳,获得10
32秒前
32秒前
沸羊羊发布了新的文献求助10
32秒前
明亮易形发布了新的文献求助10
33秒前
33秒前
111222发布了新的文献求助10
35秒前
李健应助青年才俊采纳,获得10
35秒前
好货分享发布了新的文献求助10
36秒前
高分求助中
LNG地下式貯槽指針(JGA指-107-19)(Recommended practice for LNG inground storage) 1000
rhetoric, logic and argumentation: a guide to student writers 1000
QMS18Ed2 | process management. 2nd ed 1000
Eric Dunning and the Sociology of Sport 850
Operative Techniques in Pediatric Orthopaedic Surgery 510
Generalized Linear Mixed Models 第二版 500
人工地层冻结稳态温度场边界分离方法及新解答 500
热门求助领域 (近24小时)
化学 医学 材料科学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 免疫学 细胞生物学 电极
热门帖子
关注 科研通微信公众号,转发送积分 2920546
求助须知:如何正确求助?哪些是违规求助? 2562736
关于积分的说明 6931846
捐赠科研通 2220694
什么是DOI,文献DOI怎么找? 1180454
版权声明 588696
科研通“疑难数据库(出版商)”最低求助积分说明 577528