计算机科学
对抗制
嵌入
稳健性(进化)
人工智能
深层神经网络
深度学习
算法
词(群论)
机器学习
数学
生物化学
化学
几何学
基因
作者
Jincheng Xu,Qingfeng Du
标识
DOI:10.1016/j.engappai.2020.103641
摘要
Adversarial examples are generated by adding infinitesimal perturbations to legitimate inputs so that incorrect predictions can be induced into deep learning models. They have received increasing attention recently due to their significant values in evaluating and improving the robustness of neural networks. While adversarial attack algorithms have achieved notable advancements in the continuous data of images, they cannot be directly applied for discrete symbols such as text, where all the semantic and syntactic constraints in languages are expected to be satisfied. In this paper, we propose a white-box adversarial attack algorithm, TextTricker, which supports both targeted and non-targeted attacks on text classification models. Our algorithm can be implemented in either a loss-based way, where word perturbations are performed according to the change in loss, or a gradient-based way, where the expected gradients are computed in the continuous embedding space to restrict the perturbations towards a certain direction. We perform extensive experiments on two publicly available datasets and three state-of-the-art text classification models to evaluate our algorithm. The empirical results demonstrate that TextTricker performs notably better than baselines in attack success rate. Moreover, we discuss various aspects of TextTricker in details to provide a deep investigation and offer suggestions for its practical use.
科研通智能强力驱动
Strongly Powered by AbleSci AI