强化学习
计算机科学
对抗制
稳健性(进化)
人工智能
机器学习
均方误差
模式识别(心理学)
数学
生物化学
基因
统计
化学
作者
Soumyendu Sarkar,Ashwin Ramesh Babu,Sajad Mousavi,Sahand Ghorbanpour,Vineet Gundecha,Ricardo Luna Gutiérrez,Antonio Guillén,Avisek Naug
标识
DOI:10.1109/case56687.2023.10260607
摘要
We propose a Reinforcement Learning (RL) based adversarial Black-box attack (RLAB) that aims at adding minimum distortion to the input iteratively to deceive image classification models. The RL agent learns to identify highly sensitive regions in the input's feature space to add distortions to induce misclassification with minimum steps and L2 norm. The agent also selectively removes noises introduced at earlier steps in the iteration, which has less impact on the model at a given state. This novel dual-action method is equivalent to doing a deep tree search to add noises without an exhaustive search, leading to the faster generation of an optimum adversarial sample. This black-box method focuses on naturally occurring distortion to effectively measure the robustness of models, a key element of trustworthiness. The proposed method beats existing heuristic based state-of-the-art black-box adversarial attacks on metrics such as the number of queries, L2 norm, and success rate on ImageNet and CIFAR-10 datasets. For the ImageNet dataset, the average number of queries achieved by the proposed method for ResNet-50, Inception-V3, and VGG-16 models are 42%, 32%, and 31 % better than the popular "Square Attack". Furthermore, retraining the model with adversarial samples significantly improved robustness when evaluated on benchmark datasets such as CIFAR-10-C with the metrics of adversarial error and mean corruption error (mCE). Demo: https://tinyurl.com/yr8f7x9t
科研通智能强力驱动
Strongly Powered by AbleSci AI