Medical large language models are vulnerable to data-poisoning attacks

误传 计算机科学 危害 互联网 互联网隐私 计算机安全 医疗保健 数据科学 心理学 万维网 政治学 社会心理学 法学
作者
Daniel Alber,Zihao Yang,Anton Alyakin,Eunice Yang,N. Shesh,Aly Valliani,Jeff Zhang,Gabriel R. Rosenbaum,Ashley K. Amend-Thomas,David B. Kurland,C. Kremer,Alexander Eremiev,Bruck Negash,Daniel D. Wiggan,M. Nakatsuka,Karl L. Sangwon,Sean N. Neifert,Hammad A. Khan,Akshay Save,Adhith Palla,Eric A. Grin,Monika Hedman,Mustafa Nasir-Moin,Xujin Chris Liu,Lavender Yao Jiang,Michal Mankowski,Dorry L. Segev,Yindalon Aphinyanaphongs,Howard A. Riina,John G. Golfinos,Daniel A. Orringer,Douglas Kondziolka,Eric K. Oermann
出处
期刊:Nature Medicine [Nature Portfolio]
标识
DOI:10.1038/s41591-024-03445-1
摘要

The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development. We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy that captures 91.9% of harmful content (F1 = 85.7%). Our algorithm provides a unique method to validate stochastically generated LLM outputs against hard-coded relationships in knowledge graphs. In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety. Large language models can be manipulated to generate misinformation by poisoning of a very small percentage of the data on which they are trained, but a harm mitigation strategy using biomedical knowledge graphs can offer a method for addressing this vulnerability.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
WATQ发布了新的文献求助20
刚刚
科目三应助一苇难渡采纳,获得10
刚刚
刚刚
geoyuan完成签到,获得积分10
1秒前
没有耳朵发布了新的文献求助10
2秒前
执着睫毛发布了新的文献求助10
3秒前
二十八画生完成签到 ,获得积分10
3秒前
蒋美桥发布了新的文献求助10
3秒前
3秒前
淡定的项链完成签到 ,获得积分10
3秒前
小蘑菇应助msc1996采纳,获得10
3秒前
Akim应助怕黑的冰安采纳,获得10
4秒前
崔大胖完成签到,获得积分10
5秒前
5秒前
赵伟旭发布了新的文献求助10
5秒前
斯文败类应助逯金戎采纳,获得10
5秒前
6秒前
研友_VZG7GZ应助白茶清欢采纳,获得10
6秒前
乐空思应助空心木头采纳,获得10
7秒前
小高发布了新的文献求助10
8秒前
所所应助不去担心的太远采纳,获得10
8秒前
今后应助明芬采纳,获得10
8秒前
9秒前
NexusExplorer应助ddd采纳,获得10
9秒前
张蒲喆完成签到,获得积分10
9秒前
bkagyin应助Serena采纳,获得10
9秒前
10秒前
10秒前
ZY发布了新的文献求助10
10秒前
10秒前
11秒前
依依发布了新的文献求助30
11秒前
OYE发布了新的文献求助10
12秒前
鄢亮完成签到,获得积分10
12秒前
12秒前
12秒前
QiShan完成签到,获得积分10
13秒前
希望天下0贩的0应助yzj采纳,获得10
13秒前
Quasimodo发布了新的文献求助10
13秒前
14秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Lewis’s Child and Adolescent Psychiatry: A Comprehensive Textbook Sixth Edition 2000
Cronologia da história de Macau 1600
Treatment response-adapted risk index model for survival prediction and adjuvant chemotherapy selection in nonmetastatic nasopharyngeal carcinoma 1000
Lloyd's Register of Shipping's Approach to the Control of Incidents of Brittle Fracture in Ship Structures 1000
BRITTLE FRACTURE IN WELDED SHIPS 1000
Toughness acceptance criteria for rack materials and weldments in jack-ups 800
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 计算机科学 化学工程 生物化学 物理 复合材料 内科学 催化作用 物理化学 光电子学 细胞生物学 基因 电极 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6207250
求助须知:如何正确求助?哪些是违规求助? 8033626
关于积分的说明 16733886
捐赠科研通 5298047
什么是DOI,文献DOI怎么找? 2822875
邀请新用户注册赠送积分活动 1801885
关于科研通互助平台的介绍 1663380