可转让性
黑匣子
对抗制
特征(语言学)
计算机科学
人工智能
模式识别(心理学)
机器学习
语言学
哲学
罗伊特
作者
Maoyuan Wang,Jinwei Wang,Bin Ma,Xiangyang Luo
标识
DOI:10.1016/j.neucom.2024.127863
摘要
Deep neural networks (DNNs) are vulnerable and susceptible to imperceptible perturbations. Adversarial examples become more and more popular. Black-box attacks are considered to be the most realistic scenario. Currently, transfer-based black-box attacks show excellent performance. However, transfer-based black-box attacks all require an agent model of the attack, which we call the source model. This leads to the existing transfer-based attacks limited by the features focused on the source model, which creates a bottleneck in improving the transferability of adversarial examples. In order to solve this problem, we propose an attack that mainly targets features that are insensitive to the source model, which we call the black-box feature attack. Specifically, we categorize the features of the image into white-box features and black-box features. The white-box features are source model-sensitive features and the black-box features are source model insensitive features. White-box features are only specific to the source model, while black-box features are more generalized for unknown models. By destroying the image white-box features, the fitted image is obtained and the model intermediate layer feature map is extracted. Afterward, the fitting gradient is found for the fitted images with different fitting degrees. We construct loss functions based on the obtained fitting gradients and feature maps to guide the attacks to better destroy the black-box features of the images. Extensive experiments demonstrate that our methods have higher transferability compared to state-of-the-art methods, which achieve more than 90% of transferability under the normal model. It is also significantly better than other methods on adversarially trained models. Even in the white-box setting, our attack has the best performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI