Assessing GPT-4’s Performance in Delivering Medical Advice: Comparative Analysis With Human Experts

医学诊断 清晰 计算机科学 代理(统计) 分级(工程) 医疗保健 词汇 医学教育 医学 心理学 机器学习 病理 工程类 生物化学 哲学 土木工程 经济 化学 经济增长 语言学
作者
Eunbeen Jo,Sanghoun Song,Jong-Ho Kim,Subin Lim,Ju Hyeon Kim,Jung‐Joon Cha,Young-Min Kim,Hyung Joon Joo
出处
期刊:JMIR medical education [JMIR Publications Inc.]
卷期号:10: e51282-e51282 被引量:2
标识
DOI:10.2196/51282
摘要

Abstract Background Accurate medical advice is paramount in ensuring optimal patient care, and misinformation can lead to misguided decisions with potentially detrimental health outcomes. The emergence of large language models (LLMs) such as OpenAI’s GPT-4 has spurred interest in their potential health care applications, particularly in automated medical consultation. Yet, rigorous investigations comparing their performance to human experts remain sparse. Objective This study aims to compare the medical accuracy of GPT-4 with human experts in providing medical advice using real-world user-generated queries, with a specific focus on cardiology. It also sought to analyze the performance of GPT-4 and human experts in specific question categories, including drug or medication information and preliminary diagnoses. Methods We collected 251 pairs of cardiology-specific questions from general users and answers from human experts via an internet portal. GPT-4 was tasked with generating responses to the same questions. Three independent cardiologists (SL, JHK, and JJC) evaluated the answers provided by both human experts and GPT-4. Using a computer interface, each evaluator compared the pairs and determined which answer was superior, and they quantitatively measured the clarity and complexity of the questions as well as the accuracy and appropriateness of the responses, applying a 3-tiered grading scale (low, medium, and high). Furthermore, a linguistic analysis was conducted to compare the length and vocabulary diversity of the responses using word count and type-token ratio. Results GPT-4 and human experts displayed comparable efficacy in medical accuracy (“GPT-4 is better” at 132/251, 52.6% vs “Human expert is better” at 119/251, 47.4%). In accuracy level categorization, humans had more high-accuracy responses than GPT-4 (50/237, 21.1% vs 30/238, 12.6%) but also a greater proportion of low-accuracy responses (11/237, 4.6% vs 1/238, 0.4%; P =.001). GPT-4 responses were generally longer and used a less diverse vocabulary than those of human experts, potentially enhancing their comprehensibility for general users (sentence count: mean 10.9, SD 4.2 vs mean 5.9, SD 3.7; P <.001; type-token ratio: mean 0.69, SD 0.07 vs mean 0.79, SD 0.09; P <.001). Nevertheless, human experts outperformed GPT-4 in specific question categories, notably those related to drug or medication information and preliminary diagnoses. These findings highlight the limitations of GPT-4 in providing advice based on clinical experience. Conclusions GPT-4 has shown promising potential in automated medical consultation, with comparable medical accuracy to human experts. However, challenges remain particularly in the realm of nuanced clinical judgment. Future improvements in LLMs may require the integration of specific clinical reasoning pathways and regulatory oversight for safe use. Further research is needed to understand the full potential of LLMs across various medical specialties and conditions.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
zhw完成签到 ,获得积分10
1秒前
稳重的闭月完成签到,获得积分10
1秒前
jiangjiang发布了新的文献求助10
2秒前
飘逸慕灵完成签到,获得积分10
3秒前
乔乔完成签到,获得积分20
3秒前
4秒前
4秒前
yanzzz发布了新的文献求助10
5秒前
平常善若完成签到,获得积分10
5秒前
5秒前
迪西发布了新的文献求助10
5秒前
6秒前
7秒前
不安青牛发布了新的文献求助200
7秒前
yyyyy关注了科研通微信公众号
8秒前
bkagyin应助ke研白采纳,获得10
8秒前
玖玖完成签到,获得积分10
9秒前
10秒前
zhangxueqing发布了新的文献求助10
10秒前
Singularity应助腼腆的秀采纳,获得20
10秒前
39完成签到,获得积分10
11秒前
乐怡日尧发布了新的文献求助10
11秒前
平常善若发布了新的文献求助50
11秒前
why完成签到,获得积分10
12秒前
怪咖发布了新的文献求助10
13秒前
14秒前
15秒前
15秒前
17秒前
18秒前
海螺姑娘完成签到,获得积分20
18秒前
科研通AI2S应助Olivia采纳,获得30
19秒前
庆次完成签到 ,获得积分10
20秒前
20秒前
li完成签到,获得积分10
21秒前
21秒前
乐怡日尧完成签到,获得积分10
22秒前
ym发布了新的文献求助10
22秒前
严惜发布了新的文献求助10
22秒前
高分求助中
【此为提示信息,请勿应助】请按要求发布求助,避免被关 20000
【本贴是提醒信息,请勿应助】请在求助之前详细阅读求助说明!!!! 20000
Evolution 4000
좌파는 어떻게 좌파가 됐나:한국 급진노동운동의 형성과 궤적 2500
Sustainability in Tides Chemistry 1500
La Chine révolutionnaire d'aujourd'hui / Van Min, Kang Hsin 1000
TM 5-855-1(Fundamentals of protective design for conventional weapons) 1000
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3036687
求助须知:如何正确求助?哪些是违规求助? 2695589
关于积分的说明 7353212
捐赠科研通 2337318
什么是DOI,文献DOI怎么找? 1237179
科研通“疑难数据库(出版商)”最低求助积分说明 602405
版权声明 594978