计算机科学
知识库
一致性(知识库)
质量(理念)
完备性(序理论)
数据挖掘
数据质量
RDF公司
链接数据
任务(项目管理)
仿形(计算机编程)
情报检索
人工智能
语义网
数学
工程类
操作系统
数学分析
哲学
认识论
公制(单位)
系统工程
运营管理
作者
Mohammad Rifat Ahmmad Rashid,Marco Torchiano,Giuseppe Rizzo,Nandana Mihindukulasooriya,Óscar Corcho
出处
期刊:Semantic web
[IOS Press]
日期:2018-09-11
卷期号:10 (2): 349-383
被引量:13
摘要
Knowledge bases are nowadays essential components for any task that requires automation with some degrees of intelligence. Assessing the quality of a knowledge base is a complex task as it often means measuring the quality of structured information, ontologies and vocabularies, and queryable endpoi nts. Popular knowledge bases such as DBpedia, YAGO2, and Wikidata have chosen the RDF data model to represent their data due to its capabilities for semantically rich knowledge representation. Despite its advantages, there are challenges in using RDF data model, for example, data quality assessment and validation. In this paper, we present a novel knowledge base quality assessment approach that relies on evolution analysis. The proposed approach uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. Our quality characteristics are based on the evolution analysis and we used high-level change detection for measurement functions. In particular, we propose four quality characteristics: Persistency, Historical Persistency, Consistency, and Completeness. Persistency and historical persistency measure the degree of changes and lifespan of any entity type. Consistency and completeness identify properties with incomplete information and contradictory facts. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty. The capability of Persistency and Consistency characteristics to detect quality issues varies significantly between the two case studies. Persistency gives observational results for evolving knowledge bases. It is highly effective in case of knowledge bases with periodic updates such as the 3cixty one. The Completeness characteristic is extremely effective and was able to achieve 95% precision in error detection for both use cases. The measures are based on simple statistical operations that make the solution both flexible and scalable.
科研通智能强力驱动
Strongly Powered by AbleSci AI