清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study

急诊分诊台 观察研究 软件部署 任务(项目管理) 机器学习 计算机科学 数据科学 医学 医疗急救 人工智能 病理 工程类 系统工程 软件工程
作者
David M. Levine,Rudraksh Tuwani,Benjamin Kompa,Amita Varma,Samuel G. Finlayson,Ateev Mehrotra,Andrew L. Beam
出处
期刊:The Lancet Digital Health [Elsevier]
卷期号:6 (8): e555-e561 被引量:22
标识
DOI:10.1016/s2589-7500(24)00097-9
摘要

BackgroundArtificial intelligence (AI) applications in health care have been effective in many areas of medicine, but they are often trained for a single task using labelled data, making deployment and generalisability challenging. How well a general-purpose AI language model performs diagnosis and triage relative to physicians and laypeople is not well understood.MethodsWe compared the predictive accuracy of Generative Pre-trained Transformer 3 (GPT-3)'s diagnostic and triage ability for 48 validated synthetic case vignettes (<50 words; sixth-grade reading level or below) of both common (eg, viral illness) and severe (eg, heart attack) conditions to a nationally representative sample of 5000 lay people from the USA who could use the internet to find the correct options and 21 practising physicians at Harvard Medical School. There were 12 vignettes for each of four triage categories: emergent, within one day, within 1 week, and self-care. The correct diagnosis and triage category (ie, ground truth) for each vignette was determined by two general internists at Harvard Medical School. For each vignette, human respondents and GPT-3 were prompted to list diagnoses in order of likelihood, and the vignette was marked as correct if the ground-truth diagnosis was in the top three of the listed diagnoses. For triage accuracy, we examined whether the human respondents' and GPT-3's selected triage was exactly correct according to the four triage categories, or matched a dichotomised triage variable (emergent or within 1 day vs within 1 week or self-care). We estimated GPT-3's diagnostic and triage confidence on a given vignette using a modified bootstrap resampling procedure, and examined how well calibrated GPT-3's confidence was by computing calibration curves and Brier scores. We also performed subgroup analysis by case acuity, and an error analysis for triage advice to characterise how its advice might affect patients using this tool to decide if they should seek medical care immediately.FindingsAmong all cases, GPT-3 replied with the correct diagnosis in its top three for 88% (42/48, 95% CI 75–94) of cases, compared with 54% (2700/5000, 53–55) for lay individuals (p<0.0001) and 96% (637/666, 94–97) for physicians (p=0·012). GPT-3 triaged 70% correct (34/48, 57–82) versus 74% (3706/5000, 73–75; p=0.60) for lay individuals and 91% (608/666, 89–93%; p<0.0001) for physicians. As measured by the Brier score, GPT-3 confidence in its top prediction was reasonably well calibrated for diagnosis (Brier score=0·18) and triage (Brier score=0·22). We observed an inverse relationship between case acuity and GPT-3 accuracy (p<0·0001) with a fitted trend line of –8·33% decrease in accuracy for every level of increase in case acuity. For triage error analysis, GPT-3 deprioritised truly emergent cases in seven instances.InterpretationA general-purpose AI language model without any content-specific training could perform diagnosis at levels close to, but below, physicians and better than lay individuals. We found that GPT-3's performance was inferior to physicians for triage, sometimes by a large margin, and its performance was closer to that of lay individuals. Although the diagnostic performance of GPT-3 was comparable to physicians, it was significantly better than a typical person using a search engine.FundingThe National Heart, Lung, and Blood Institute.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
tufei发布了新的文献求助10
5秒前
6秒前
tufei完成签到,获得积分10
11秒前
12秒前
小星云发布了新的文献求助100
17秒前
川藏客完成签到 ,获得积分10
22秒前
26秒前
Arthur完成签到,获得积分10
28秒前
31秒前
Owen应助小星云采纳,获得10
32秒前
樱桃猴子应助白华苍松采纳,获得10
37秒前
42秒前
57秒前
1分钟前
大个应助白华苍松采纳,获得10
1分钟前
1分钟前
muriel完成签到,获得积分10
1分钟前
淡淡醉波wuliao完成签到 ,获得积分10
1分钟前
tutu完成签到,获得积分10
1分钟前
2分钟前
HJJHJH完成签到,获得积分20
2分钟前
HJJHJH发布了新的文献求助30
2分钟前
汉堡包应助Nan采纳,获得10
2分钟前
2分钟前
Nan发布了新的文献求助10
2分钟前
Nan驳回了李爱国应助
3分钟前
ChenYX完成签到 ,获得积分10
3分钟前
zhang完成签到,获得积分20
3分钟前
樱桃猴子应助白华苍松采纳,获得10
3分钟前
顺利的小蚂蚁完成签到,获得积分10
4分钟前
4分钟前
4分钟前
鱼太闲发布了新的文献求助10
4分钟前
Guo完成签到 ,获得积分10
4分钟前
小马甲应助鱼太闲采纳,获得10
4分钟前
4分钟前
单薄绮露完成签到,获得积分10
5分钟前
5分钟前
5分钟前
高分求助中
Production Logging: Theoretical and Interpretive Elements 2700
Social media impact on athlete mental health: #RealityCheck 1020
1.3μm GaAs基InAs量子点材料生长及器件应用 1000
Ensartinib (Ensacove) for Non-Small Cell Lung Cancer 1000
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
Bacterial collagenases and their clinical applications 800
El viaje de una vida: Memorias de María Lecea 800
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3526584
求助须知:如何正确求助?哪些是违规求助? 3107022
关于积分的说明 9282092
捐赠科研通 2804622
什么是DOI,文献DOI怎么找? 1539534
邀请新用户注册赠送积分活动 716583
科研通“疑难数据库(出版商)”最低求助积分说明 709581