聚类分析
差别隐私
范畴变量
数据发布
计算机科学
符号
数据挖掘
算法
情报检索
理论计算机科学
数学
出版
人工智能
机器学习
政治学
算术
法学
作者
Lanxiang Chen,Lingfang Zeng,Yi Mu,Leilei Chen
出处
期刊:IEEE Transactions on Knowledge and Data Engineering
[Institute of Electrical and Electronics Engineers]
日期:2023-01-17
卷期号:35 (11): 11437-11448
被引量:6
标识
DOI:10.1109/tkde.2023.3237822
摘要
With the rapid advancement of information technology, a large amount of high-value data have been generated. To exploit the potential value of big data and at the same time to protect individuals' sensitive information, a global combination and clustering based differential privacy (DP) mixed data publishing method is proposed in this paper. The main idea of the proposed method is to improve the truthfulness of the published data as well as to enhance the utility by shifting the sensitivity of query function from a single record to a group of records using $k$ -median clustering algorithm. Specifically, to improve the accuracy and utility of categorical attributes, a global combination method is proposed to take the correlation among categorical attributes into account. The proposed combination method takes all categorical attributes as a unit and then applies the exponential mechanism to improve the data utility. Then we combine it with the $k$ -median clustering with differential privacy to publish the mixed data. Theoretical analysis shows that the proposed method satisfies $\varepsilon$ -differential privacy. Experimental results on real datasets illustrate that the proposed method has a much lower information loss and time overhead than the state-of-the-art approach for the same parameters.
科研通智能强力驱动
Strongly Powered by AbleSci AI