Benchmarking Large Language Models in Evidence-Based Medicine

标杆管理 计算机科学 自然语言处理 数据科学 人工智能 业务 营销
作者
Jin Li,Yiyan Deng,Qi Sun,Junjie Zhu,Yu Tian,Jingsong Li,Tingting Zhu
出处
期刊:IEEE Journal of Biomedical and Health Informatics [Institute of Electrical and Electronics Engineers]
卷期号:: 1-14
标识
DOI:10.1109/jbhi.2024.3483816
摘要

Evidence-based medicine (EBM) represents a paradigm of providing patient care grounded in the most current and rigorously evaluated research. Recent advances in large language models (LLMs) offer a potential solution to transform EBM by automating labor-intensive tasks and thereby improving the efficiency of clinical decision-making. This study explores integrating LLMs into the key stages in EBM, evaluating their ability across evidence retrieval (PICO extraction, biomedical question answering), synthesis (summarizing randomized controlled trials), and dissemination (medical text simplification). We conducted a comparative analysis of seven LLMs, including both proprietary and open-source models, as well as those fine-tuned on medical corpora. Specifically, we benchmarked the performance of various LLMs on each EBM task under zero-shot settings as baselines, and employed prompting techniques, including in-context learning, chain-of-thought reasoning, and knowledge-guided prompting to enhance their capabilities. Our extensive experiments revealed the strengths of LLMs, such as remarkable understanding capabilities even in zero-shot settings, strong summarization skills, and effective knowledge transfer via prompting. Promoting strategies such as knowledge-guided prompting proved highly effective (e.g., improving the performance of GPT-4 by 13.10% over zero-shot in PICO extraction). However, the experiments also showed limitations, with LLM performance falling well below state-of-the-art baselines like PubMedBERT in handling named entity recognition tasks. Moreover, human evaluation revealed persisting challenges with factual inconsistencies and domain inaccuracies, underscoring the need for rigorous quality control before clinical application. This study provides insights into enhancing EBM using LLMs while highlighting critical areas for further research. The code is publicly available on Github.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
丘比特应助芝麻采纳,获得20
5秒前
何小明发布了新的文献求助10
5秒前
coco发布了新的文献求助10
5秒前
情怀应助炙热的河马采纳,获得10
5秒前
打打应助大街小巷采纳,获得10
6秒前
健忘蘑菇完成签到,获得积分10
8秒前
科研通AI2S应助完美的翠丝采纳,获得10
10秒前
10秒前
乐乐应助科研通管家采纳,获得10
11秒前
香蕉觅云应助科研通管家采纳,获得10
11秒前
上官若男应助科研通管家采纳,获得20
11秒前
我是老大应助科研通管家采纳,获得10
11秒前
大模型应助科研通管家采纳,获得10
11秒前
我是老大应助科研通管家采纳,获得10
11秒前
上官若男应助科研通管家采纳,获得10
11秒前
bkagyin应助科研通管家采纳,获得10
11秒前
11秒前
NexusExplorer应助科研通管家采纳,获得10
11秒前
时舒完成签到 ,获得积分10
11秒前
在水一方应助科研通管家采纳,获得10
11秒前
kingwill应助科研通管家采纳,获得20
11秒前
打打应助科研通管家采纳,获得10
11秒前
11秒前
11秒前
linn完成签到,获得积分10
12秒前
Zachary发布了新的文献求助10
13秒前
研友_VZG7GZ应助失眠的问梅采纳,获得10
13秒前
112我的发布了新的文献求助10
13秒前
coco完成签到,获得积分10
15秒前
16秒前
大街小巷完成签到,获得积分10
16秒前
季生完成签到,获得积分10
17秒前
17秒前
17秒前
solon完成签到,获得积分10
19秒前
呜呼啦呼完成签到,获得积分10
20秒前
岸芷诺苏发布了新的文献求助10
20秒前
htp发布了新的文献求助10
21秒前
21秒前
高分求助中
Drug Prescribing in Renal Failure: Dosing Guidelines for Adults and Children 5th Edition 2000
IZELTABART TAPATANSINE 500
Where and how to use plate heat exchangers 500
Seven new species of the Palaearctic Lauxaniidae and Asteiidae (Diptera) 400
Armour of the english knight 1400-1450 300
Handbook of Laboratory Animal Science 300
Not Equal : Towards an International Law of Finance 260
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3712158
求助须知:如何正确求助?哪些是违规求助? 3260326
关于积分的说明 9913708
捐赠科研通 2973659
什么是DOI,文献DOI怎么找? 1630756
邀请新用户注册赠送积分活动 773579
科研通“疑难数据库(出版商)”最低求助积分说明 744314