亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering

验光服务 医学 眼科 医学物理学 计算生物学 生物
作者
Fares Antaki,Daniel Milad,Mark A. Chia,Charles‐Édouard Giguère,Samir Touma,Jonathan El‐Khoury,Pearse A. Keane,Renaud Duval
出处
期刊:British Journal of Ophthalmology [BMJ]
卷期号:108 (10): 1371-1378 被引量:71
标识
DOI:10.1136/bjo-2023-324438
摘要

Background Evidence on the performance of Generative Pre-trained Transformer 4 (GPT-4), a large language model (LLM), in the ophthalmology question-answering domain is needed. Methods We tested GPT-4 on two 260-question multiple choice question sets from the Basic and Clinical Science Course (BCSC) Self-Assessment Program and the OphthoQuestions question banks. We compared the accuracy of GPT-4 models with varying temperatures (creativity setting) and evaluated their responses in a subset of questions. We also compared the best-performing GPT-4 model to GPT-3.5 and to historical human performance. Results GPT-4–0.3 (GPT-4 with a temperature of 0.3) achieved the highest accuracy among GPT-4 models, with 75.8% on the BCSC set and 70.0% on the OphthoQuestions set. The combined accuracy was 72.9%, which represents an 18.3% raw improvement in accuracy compared with GPT-3.5 (p<0.001). Human graders preferred responses from models with a temperature higher than 0 (more creative). Exam section, question difficulty and cognitive level were all predictive of GPT-4-0.3 answer accuracy. GPT-4-0.3’s performance was numerically superior to human performance on the BCSC (75.8% vs 73.3%) and OphthoQuestions (70.0% vs 63.0%), but the difference was not statistically significant (p=0.55 and p=0.09). Conclusion GPT-4, an LLM trained on non-ophthalmology-specific data, performs significantly better than its predecessor on simulated ophthalmology board-style exams. Remarkably, its performance tended to be superior to historical human performance, but that difference was not statistically significant in our study.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
CipherSage应助yzizz采纳,获得10
刚刚
小蘑菇应助shangxinyu采纳,获得10
2秒前
Moo5_zzZ发布了新的文献求助30
7秒前
yuxi2025完成签到 ,获得积分10
8秒前
JamesPei应助科研通管家采纳,获得10
8秒前
BowieHuang应助科研通管家采纳,获得10
8秒前
shhoing应助科研通管家采纳,获得10
8秒前
田様应助科研通管家采纳,获得10
8秒前
Ava应助科研通管家采纳,获得10
8秒前
JamesPei应助科研通管家采纳,获得10
8秒前
科研通AI2S应助科研通管家采纳,获得10
8秒前
8秒前
小马甲应助科研通管家采纳,获得10
8秒前
8秒前
10秒前
10秒前
紫色奶萨发布了新的文献求助10
12秒前
14秒前
任性的岱周完成签到,获得积分10
15秒前
BowieHuang应助泉此方采纳,获得10
15秒前
shangxinyu发布了新的文献求助10
15秒前
狐金华发布了新的文献求助10
16秒前
张流筝完成签到 ,获得积分10
17秒前
CipherSage应助文艺的菀采纳,获得10
21秒前
芝士奶盖有点咸完成签到 ,获得积分10
25秒前
26秒前
boyue完成签到,获得积分10
26秒前
29秒前
朝云完成签到,获得积分10
34秒前
又活了一天完成签到 ,获得积分10
35秒前
尊敬的凝丹完成签到 ,获得积分10
35秒前
黎明深雪完成签到 ,获得积分10
37秒前
万能图书馆应助ztx采纳,获得10
38秒前
两袖清风完成签到 ,获得积分10
41秒前
42秒前
43秒前
Moo5_zzZ完成签到,获得积分10
43秒前
烟花应助hy123采纳,获得10
44秒前
44秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
List of 1,091 Public Pension Profiles by Region 1581
以液相層析串聯質譜法分析糖漿產品中活性雙羰基化合物 / 吳瑋元[撰] = Analysis of reactive dicarbonyl species in syrup products by LC-MS/MS / Wei-Yuan Wu 1000
Lloyd's Register of Shipping's Approach to the Control of Incidents of Brittle Fracture in Ship Structures 800
Biology of the Reptilia. Volume 21. Morphology I. The Skull and Appendicular Locomotor Apparatus of Lepidosauria 600
The Scope of Slavic Aspect 600
Foregrounding Marking Shift in Sundanese Written Narrative Segments 600
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5543024
求助须知:如何正确求助?哪些是违规求助? 4629142
关于积分的说明 14610916
捐赠科研通 4570411
什么是DOI,文献DOI怎么找? 2505751
邀请新用户注册赠送积分活动 1483053
关于科研通互助平台的介绍 1454364