对抗制
计算机科学
歪斜
稳健性(进化)
数学优化
约束(计算机辅助设计)
人工智能
理论计算机科学
数学
电信
生物化学
化学
基因
几何学
作者
Tao Zhang,Tianqing Zhu,Jing Li,Wanlei Zhou,Philip S. Yu
标识
DOI:10.1016/j.knosys.2023.110777
摘要
Existing research literally evaluates model fairness over limited observed data. In practice, however, factors such as maliciously crafted examples and naturally corrupted examples often appear in real-world data collection. This severely limits the reliability of bias removal methods, inhibits fairness improvement in long-term learning systems, and probes to study accuracy-related robustness. Therefore, we ask: How adversarial examples will skew model fairness? In this paper, we investigate the vulnerability of individual fairness and group fairness to adversarial attacks. We further propose a general adversarial fairness attack framework capable of twisting model bias through a small subset of adversarial examples. We formulate this problem as an optimization problem: maximizing the model bias with the constraint of the number of adversarial examples and the perturbation scale. Our approach finds the most vulnerable examples to model fairness based on the estimated distance from examples to the decision boundary and demographic information. The experimental results 1 show that model fairness is easily skewed by a small number of adversarial examples.
科研通智能强力驱动
Strongly Powered by AbleSci AI