Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis

计算机科学 数据科学
作者
Qiuhong Wei,Zhengxiong Yao,Ying Cui,Bo Wei,Zhezhen Jin,Ximing Xu
出处
期刊:Journal of Biomedical Informatics [Elsevier BV]
卷期号:151: 104620-104620 被引量:41
标识
DOI:10.1016/j.jbi.2024.104620
摘要

Large language models (LLMs) such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in answering medical questions and provide direction for future research. An extensive literature search was conducted on June 15, 2023, across ten medical databases. The keyword used was "ChatGPT," without restrictions on publication type, language, or date. Studies evaluating ChatGPT's performance in answering medical questions were included. Exclusions comprised review articles, comments, patents, non-medical evaluations of ChatGPT, and preprint studies. Data was extracted on general study characteristics, question sources, conversation processes, assessment metrics, and performance of ChatGPT. An evaluation framework for LLM in medical inquiries was proposed by integrating insights from selected literature. This study is registered with PROSPERO, CRD42023456327. A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. ChatGPT displayed an overall integrated accuracy of 56 % (95 % CI: 51 %–60 %, I2 = 87 %) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. As per our proposed evaluation framework, many studies failed to report methodological details, such as the date of inquiry, version of ChatGPT, and inter-rater consistency. This review reveals ChatGPT's potential in addressing medical inquiries, but the heterogeneity of the study design and insufficient reporting might affect the results' reliability. Our proposed evaluation framework provides insights for the future study design and transparent reporting of LLM in responding to medical questions.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
qq美眉发布了新的文献求助10
刚刚
理想发布了新的文献求助10
刚刚
疯狂的过客关注了科研通微信公众号
1秒前
柠溪发布了新的文献求助10
1秒前
科研通AI6.4应助yeyeye采纳,获得10
1秒前
量子星尘发布了新的文献求助10
1秒前
归海一刀完成签到,获得积分10
2秒前
科目三应助阿卡波糖拌饭采纳,获得10
3秒前
帝蒼发布了新的文献求助30
3秒前
4秒前
忐忑的源智完成签到,获得积分10
5秒前
廖晓钰发布了新的文献求助30
5秒前
5秒前
领导范儿应助甜糖弟弟采纳,获得10
6秒前
7秒前
陈少华完成签到 ,获得积分10
8秒前
8秒前
一碗晚月发布了新的文献求助30
9秒前
9秒前
9秒前
大个应助罗喉采纳,获得10
10秒前
草木发布了新的文献求助10
10秒前
11秒前
秋辞完成签到,获得积分10
12秒前
李欣雨完成签到,获得积分10
12秒前
lucy完成签到,获得积分10
13秒前
13秒前
13秒前
13秒前
英勇蘑菇完成签到,获得积分10
15秒前
万能图书馆应助True采纳,获得10
16秒前
子车立轩发布了新的文献求助10
16秒前
0807完成签到,获得积分10
16秒前
田様应助好运莲莲莲采纳,获得10
17秒前
SaL完成签到,获得积分10
18秒前
柒邪发布了新的文献求助10
18秒前
18秒前
明亮冷珍完成签到,获得积分10
19秒前
FashionBoy应助YuLu采纳,获得10
19秒前
20秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Earth System Geophysics 1000
Bioseparations Science and Engineering Third Edition 1000
Lloyd's Register of Shipping's Approach to the Control of Incidents of Brittle Fracture in Ship Structures 1000
BRITTLE FRACTURE IN WELDED SHIPS 1000
Entre Praga y Madrid: los contactos checoslovaco-españoles (1948-1977) 1000
Encyclopedia of Materials: Plastics and Polymers 800
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 纳米技术 有机化学 物理 生物化学 化学工程 计算机科学 复合材料 内科学 催化作用 光电子学 物理化学 电极 冶金 遗传学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 6116142
求助须知:如何正确求助?哪些是违规求助? 7944425
关于积分的说明 16474039
捐赠科研通 5239997
什么是DOI,文献DOI怎么找? 2799604
邀请新用户注册赠送积分活动 1781201
关于科研通互助平台的介绍 1653244