计算机科学
对手
杠杆(统计)
对抗制
威胁模型
低信心
云计算
计算机安全
数据建模
人工智能
数据库
操作系统
心理学
社会心理学
作者
Jiliang Zhang,Shuang Peng,Yansong Gao,Zhi Zhang,Qinghui Hong
标识
DOI:10.1109/tifs.2023.3246766
摘要
Training a Deep Learning (DL) model requires proprietary data and computing-intensive resources. To recoup their training costs, a model provider can monetize DL models through Machine Learning as a Service (MLaaS). Generally, the model is deployed at the cloud, while providing a publicly accessible Application Programming Interface (API) for paid queries to obtain benefits. However, model stealing attacks have posed security threats to this model monetizing scheme as they steal the model without paying for future extensive queries. Specifically, an adversary queries a targeted model to obtain input-output pairs and thus infer the model’s internal working mechanism by reverse-engineering a substitute model, which has deprived model owner’s business advantage and leaked the privacy of the model. In this work, we observe that the confidence vector or the top-1 confidence returned from the model under attack (MUA) varies in a relative large degree given different queried inputs. Therefore, rich internal information of the MUA is leaked to the attacker that facilities her reconstruction of a substitute model. We thus propose to leverage adversarial confidence perturbation to hide such varied confidence distribution given different queries, consequentially against model stealing attacks (dubbed as APMSA). In other words, the confidence vectors returned now is similar for queries from a specific category, considerably reducing information leakage of the MUA. To achieve this objective, through automated optimization, we constructively add delicate noise into per input query to make its confidence close to the decision boundary of the MUA. Generally, this process is achieved in a similar means of crafting adversarial examples but with a distinction that the hard label is preserved to be the same as the queried input. This retains the inference utility (i.e., without sacrificing the inference accuracy) for normal users but bounded the leaked confidence information to the attacker in a small constrained area (i.e., close to decision boundary). The later renders greatly deteriorated accuracy of the attacker’s substitute model. As the APMSA serves as a plug-in front-end and requires no change to the MUA, it is thus generic and easy to deploy. The high efficacy of APMSA is validated through experiments on datasets of CIFAR10 and GTSRB. Given a MUA model of ResNet-18 on the CIFAR10, our defense can degrade the accuracy of the stolen model by up to 15% (rendering the stolen model useless to a large extent) with 0% accuracy drop for normal user’s hard-label inference request.
科研通智能强力驱动
Strongly Powered by AbleSci AI