后门
人工智能
杠杆(统计)
自然语言处理
计算机科学
特征(语言学)
机器学习
哲学
语言学
计算机安全
作者
Xiangjun Li,Xin Lu,Peixuan Li
出处
期刊:IEEE Transactions on Reliability
[Institute of Electrical and Electronics Engineers]
日期:2024-03-29
卷期号:73 (3): 1559-1568
被引量:1
标识
DOI:10.1109/tr.2024.3375526
摘要
At present, deep neural networks are at risk from backdoor attacks, but natural language processing (NLP) lacks sufficient research on backdoor attacks. To improve the invisibility of backdoor attacks, some innovative textual backdoor attack methods utilize modern language models to generate poisoned text with backdoor triggers, which are called feature space backdoor attacks. However, this article find that texts generated by the same language model without backdoor triggers also have a high probability of activating the backdoors they injected. Therefore, this article proposes a multistyle transfer-based backdoor attack that uses multiple text styles as the backdoor trigger. Furthermore, inspired by the ability of modern language models to distinguish between texts generated by different language models, this article proposes a paraphrase-based backdoor attack, which leverages the shared characteristics of sentences generated by the same paraphrase model as the backdoor trigger. Experiments have been conducted to demonstrate that both backdoor attack methods can be effective against NLP models. More importantly, compared with other feature space backdoor attacks, the poisoned samples generated by paraphrase-based backdoor attacks have improved semantic similarity.
科研通智能强力驱动
Strongly Powered by AbleSci AI