对抗制
计算机科学
摄动(天文学)
深层神经网络
人工智能
算法
机器学习
人工神经网络
量子力学
物理
作者
Zhiyu Zhu,J. Z. Zhang,Zhibo Jin,Xinyi Wang,Minhui Xue,Jun Shen,Kim–Kwang Raymond Choo,Huaming Chen
标识
DOI:10.1007/978-3-031-43412-9_9
摘要
Deep neural networks can be potentially vulnerable to adversarial samples. For example, by introducing tiny perturbations in the data sample, the model behaviour may be significantly altered. While the adversarial samples can be leveraged to enhance the model robustness and performance with adversarial training, one critical attribute of the adversarial samples is the perturbation rate. A lower perturbation rate means a smaller difference between the adversarial and the original sample. It results in closer features learnt from the model for the adversarial and original samples, resulting in higher-quality adversarial samples. How to design a successful attacking algorithm with a minimum perturbation rate remains challenging. In this work, we consider pruning algorithms to dynamically minimise the perturbation rate for adversarial attacks. In particularly, we propose, for the first time, an attribution based perturbation reduction method named Min-PR for white-box adversarial attacks. The comprehensive experiment results demonstrate Min-PR can achieve minimal perturbation rates of adversarial samples while providing guarantee to train robust models. The code in this paper is available at: https://github.com/LMBTough/Min-PR .
科研通智能强力驱动
Strongly Powered by AbleSci AI