重复数据消除
计算机科学
数据挖掘
上传
云计算
聚类分析
指纹(计算)
云存储
加密
数据库
计算机安全
人工智能
操作系统
作者
Fengkun Gao,Chunbo Wang,Xiaoqiang Di,Cao Jian,Xu Liu,Hui Qi
标识
DOI:10.1109/ispa-bdcloud-socialcom-sustaincom57177.2022.00112
摘要
Although cloud storage technology can provide users with convenient storage services, a large amount of duplicate data in the cloud brings a huge storage burden and the risk of privacy leakage. To improve the utilization of cloud storage resources and protect data confidentiality, random message lock encryption technology (R-MLE) can be used to delete redundant data in the cloud. But the theoretical basis of the deduplication scheme based on R-MLE is bilinear mapping, so the computational cost of finding duplicate fingerprint-tags is relatively large. To improve the deduplication efficiency, we proposed a secure deduplication scheme based on the autoencoder model in our previous research, using the model to generate the abstract-tags of the data, and using the similarity of the abstract-tags to quickly filter out the fingerprint-tags with high repeatability, which greatly reduces the number of fingerprint-tag comparisons. On this basis, this paper further proposes a secure deduplication method based on k-means clustering. First, the abstract-tags in cloud storage are clustered, and then the distance between the abstract-tags uploaded by users and the centroid is calculated. Then, the abstract-tags of the category with the closest distance are selected. Finally, duplicate data detection is performed only on the fingerprint-tags corresponding to these abstract-tags. In this way, the filtering speed of fingerprint-tags can be further accelerated. Experiments show that our method has higher performance than the secure deduplication method based on the autoencoder model.
科研通智能强力驱动
Strongly Powered by AbleSci AI