瓶颈
计算机科学
元组
度量(数据仓库)
任务(项目管理)
领域(数学分析)
过程(计算)
质量(理念)
数据集成
数据挖掘
机器学习
数据科学
系统工程
工程类
数学
数学分析
哲学
认识论
离散数学
嵌入式系统
操作系统
作者
Francesco Del Buono,Guglielmo Faggioli,Matteo Paganelli,Andrea Baraldi,Francesco Guerra,Nicola Ferro
出处
期刊:Applied computing review
[Association for Computing Machinery]
日期:2022-12-01
卷期号:22 (4): 5-23
被引量:1
标识
DOI:10.1145/3584014.3584015
摘要
Evaluation is a bottleneck in data integration processes: it is performed by domain experts through manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all integrated tuples infeasible. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process. The paper motivates and introduces the measure and provides extensive experimental evaluations, that show the effectiveness and the efficiency of the approach.
科研通智能强力驱动
Strongly Powered by AbleSci AI