计算机科学
差别隐私
上传
统计的
大数据
互联网
计算机网络
数据挖掘
万维网
统计
数学
作者
Zhipeng Cai,Xu Zheng,Jinbao Wang,Zaobo He
标识
DOI:10.1109/tmc.2022.3164325
摘要
The data collected in Internet of Thing (IoT) systems (IoT data) have stimulated dramatic extension to the boundary of commercialized data statistic analysis, owing to the pervasive availability of low-cost wireless network access and off-the-shelf mobile devices. In such cases, many data consumers post their queries for urban statistic analysis in the system, like the scales of traffics, and then data contributors in IoT networks upload their contents, which can be evaluated by data brokers and responded to data consumers. However, huge volumes of devices bring large scales of data, constituting heavy burdens for data exchange. Even worse, contents in IoT systems are also sensitive as they are usually linked to private physical status of data contributors. The previous studies for IoT data trading fail to provide comprehensive estimation and pricing towards these difficulties. Therefore, this paper proposes a novel framework for the range counting trading over IoT networks by jointly considering data utility, bandwidth consumption, and privacy preservation. The range counting accumulates the number of data items falling in a concerned range of value, providing important information on the underlying data distribution. This paper first proposes a novel sampling-based method with histogram sketching for range counting estimation. The estimator is proved to be unbiased and achieves advanced performance on variance. Then the framework adopts a perturbation mechanism that can further preserve the results under differential privacy. The theoretical analysis shows that the mechanism can guarantee the privacy preservation under a given size of samples and the accuracy requirement of results. Finally, two types of pricing strategies for range counting trading are introduced for different circumstances, providing holistic consideration on how the parameters given in the estimator should be used for data trading. The framework is evaluated by estimating the air pollution levels and the traffic levels with different ranges on the 2014 CityPulse Smart City datasets. The evaluation results demonstrate that our framework can provide more accurate and reliable statistical information, with reduced bandwidth consumption and strengthened privacy preservation.
科研通智能强力驱动
Strongly Powered by AbleSci AI