PrivBayes

差别隐私计算机科学元组维数之咒噪音（视频）构造（python库）数据挖掘光学（聚焦）边际分布理论计算机科学算法

作者

Jun Zhang,Graham Cormode,Cecilia M. Procopiuc,Divesh Srivastava,Xiaokui Xiao

出处

期刊：ACM Transactions on Database Systems [Association for Computing Machinery]
日期：2017-11-13 卷期号：42 (4): 1-41

链接

acm.org ac.ukdoi.org

标识

DOI：10.1145/3134428

摘要

Privacy-preserving data publishing is an important problem that has been the focus of extensive study. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods require injecting a prohibitive amount of noise compared to the signal in the data, which renders the published data next to useless. To address the deficiency of the existing methods, this paper presents P riv B ayes , a differentially private method for releasing high-dimensional data. Given a dataset D , P riv B ayes first constructs a Bayesian network N , which (i) provides a succinct model of the correlations among the attributes in D and (ii) allows us to approximate the distribution of data in D using a set P of low-dimensional marginals of D . After that, P riv B ayes injects noise into each marginal in P to ensure differential privacy and then uses the noisy marginals and the Bayesian network to construct an approximation of the data distribution in D . Finally, P riv B ayes samples tuples from the approximate distribution to construct a synthetic dataset, and then releases the synthetic data. Intuitively, P riv B ayes circumvents the curse of dimensionality, as it injects noise into the low-dimensional marginals in P instead of the high-dimensional dataset D . Private construction of Bayesian networks turns out to be significantly challenging, and we introduce a novel approach that uses a surrogate function for mutual information to build the model more accurately. We experimentally evaluate P riv B ayes on real data and demonstrate that it significantly outperforms existing solutions in terms of accuracy.

求助该文献

最长约 10秒，即可获得该文献文件

PrivBayes

今日热心研友