Medical large language models are vulnerable to data-poisoning attacks

误传 计算机科学 危害 互联网 互联网隐私 计算机安全 医疗保健 数据科学 心理学 万维网 政治学 社会心理学 法学
作者
Daniel Alber,Zihao Yang,Anton Alyakin,Eunice Yang,N. Shesh,Aly Valliani,Jeff Zhang,Gabriel R. Rosenbaum,Ashley K. Amend-Thomas,David B. Kurland,C. Kremer,Alexander Eremiev,Bruck Negash,Daniel D. Wiggan,M. Nakatsuka,Karl L. Sangwon,Sean N. Neifert,Hammad A. Khan,Akshay Save,Adhith Palla,Eric A. Grin,Monika Hedman,Mustafa Nasir-Moin,Xujin Chris Liu,Lavender Yao Jiang,Michal Mankowski,Dorry L. Segev,Yindalon Aphinyanaphongs,Howard A. Riina,John G. Golfinos,Daniel A. Orringer,Douglas Kondziolka,Eric K. Oermann
出处
期刊:Nature Medicine [Springer Nature]
标识
DOI:10.1038/s41591-024-03445-1
摘要

The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development. We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy that captures 91.9% of harmful content (F1 = 85.7%). Our algorithm provides a unique method to validate stochastically generated LLM outputs against hard-coded relationships in knowledge graphs. In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety. Large language models can be manipulated to generate misinformation by poisoning of a very small percentage of the data on which they are trained, but a harm mitigation strategy using biomedical knowledge graphs can offer a method for addressing this vulnerability.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
简单的煎饼完成签到,获得积分10
2秒前
5秒前
Akim应助莉莉采纳,获得10
6秒前
SS发布了新的文献求助10
7秒前
9秒前
10秒前
11秒前
13秒前
清秀萤发布了新的文献求助10
14秒前
海4015发布了新的文献求助10
14秒前
18秒前
xutong de完成签到,获得积分10
18秒前
行走发布了新的文献求助10
19秒前
帅气的如豹应助戴维采纳,获得10
19秒前
大吱吱完成签到,获得积分10
21秒前
小沫完成签到,获得积分10
23秒前
24秒前
清秀萤完成签到,获得积分10
25秒前
优雅的数据线完成签到,获得积分10
25秒前
26秒前
Francis发布了新的文献求助10
28秒前
iNk应助活力老头采纳,获得20
28秒前
揽星色应助科研战士采纳,获得10
29秒前
30秒前
cc66发布了新的文献求助10
32秒前
34秒前
36秒前
biubiubiu发布了新的文献求助10
36秒前
李爱国应助贺无剑采纳,获得10
41秒前
Francis完成签到,获得积分10
42秒前
45秒前
诚心的碧空完成签到,获得积分10
46秒前
47秒前
无花果应助yhbq采纳,获得10
48秒前
CodeCraft应助rachel采纳,获得10
48秒前
Sunny发布了新的文献求助10
49秒前
大模型应助夜白采纳,获得10
50秒前
戴维完成签到,获得积分20
51秒前
一一发布了新的文献求助10
52秒前
victory_liu发布了新的文献求助10
55秒前
高分求助中
Comprehensive natural products III : chemistry and biology 3000
进口的时尚——14世纪东方丝绸与意大利艺术 Imported Fashion:Oriental Silks and Italian Arts in the 14th Century 800
Glucuronolactone Market Outlook Report: Industry Size, Competition, Trends and Growth Opportunities by Region, YoY Forecasts from 2024 to 2031 800
Zeitschrift für Orient-Archäologie 500
The Collected Works of Jeremy Bentham: Rights, Representation, and Reform: Nonsense upon Stilts and Other Writings on the French Revolution 320
Equality: What It Means and Why It Matters 300
A new Species and a key to Indian species of Heirodula Burmeister (Mantodea: Mantidae) 300
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 细胞生物学 免疫学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3346458
求助须知:如何正确求助?哪些是违规求助? 2973193
关于积分的说明 8658263
捐赠科研通 2653611
什么是DOI,文献DOI怎么找? 1453276
科研通“疑难数据库(出版商)”最低求助积分说明 672801
邀请新用户注册赠送积分活动 662691