数据匿名化
标识符
打开数据
计算机科学
开放的政府
政府(语言学)
数据共享
联动装置(软件)
宪章
数据科学
数据挖掘
信息隐私
计算机安全
万维网
病理
考古
化学
程序设计语言
替代医学
哲学
基因
历史
医学
生物化学
语言学
作者
Jae‐Seong Lee,Seung‐pyo Jun
标识
DOI:10.1016/j.giq.2020.101544
摘要
Open data is a global movement with the potential to generate significant social and economic benefits. Policies on open government data (OGD) inspire the development of new and innovative services that government agencies may lack. The International Open Data Charter adequately describes the importance of data mining. Governments that have signed this charter should focus on the following areas—(i) data mining, (ii) linkage, and (iii) in-depth analysis, i.e., distribution of open data that is freely accessible for elaborate analysis using machine reading. However, a series of practical difficulties is observed in connection with the data mining of OGD for in-depth analysis. First, most OGD do not have identifiers to prevent privacy disclosure. Second, owing to the nature of siloed data, the data sharing and collection methods vary with respect to heterogeneous OGD, and administrative or institutional barriers need to be overcome. This has created a demand for a novel technical solution that applies micro-aggregation and distance-based record linkage to address the aforementioned issues. Thus, in this study, a method capable of integrating two or more de-identified OGDs into one dataset to enable OGD data mining is proposed. In addition, the proposed method allows users to adjust the privacy threshold level to determine an appropriate balance between privacy disclosure risk and data utility. The effectiveness of the method is evaluated in terms of several metrics via extensive experimentation. This study emphasizes the importance of the research on efficient utilization of already-published OGDs, which has been relatively neglected in the past. Further, it broadens the research area for privacy-preserving data mining by proposing a method capable of mining heterogeneous data even in the absence of identifiers.
科研通智能强力驱动
Strongly Powered by AbleSci AI