计算机科学
可扩展性
SPARK(编程语言)
分布式计算
隐藏物
蒙德里安
递归(计算机科学)
计算
并行计算
数据库
算法
艺术
视觉艺术
程序设计语言
绘画
作者
Sibghat Ullah Bazai,Julian Jang‐Jaccard,Hooman Alavizadeh
出处
期刊:ACM transactions on privacy and security
[Association for Computing Machinery]
日期:2021-11-23
卷期号:25 (1): 1-25
被引量:7
摘要
Multi-dimensional data anonymization approaches (e.g., Mondrian) ensure more fine-grained data privacy by providing a different anonymization strategy applied for each attribute. Many variations of multi-dimensional anonymization have been implemented on different distributed processing platforms (e.g., MapReduce, Spark) to take advantage of their scalability and parallelism supports. According to our critical analysis on overheads, either existing iteration-based or recursion-based approaches do not provide effective mechanisms for creating the optimal number of and relative size of resilient distributed datasets (RDDs), thus heavily suffer from performance overheads. To solve this issue, we propose a novel hybrid approach for effectively implementing a multi-dimensional data anonymization strategy (e.g., Mondrian) that is scalable and provides high-performance. Our hybrid approach provides a mechanism to create far fewer RDDs and smaller size partitions attached to each RDD than existing approaches. This optimal RDD creation and operations approach is critical for many multi-dimensional data anonymization applications that create tremendous execution complexity. The new mechanism in our proposed hybrid approach can dramatically reduce the critical overheads involved in re-computation cost, shuffle operations, message exchange, and cache management.
科研通智能强力驱动
Strongly Powered by AbleSci AI