计算机科学
JSON文件
模式(遗传算法)
模棱两可
星型模式
启发式
情报检索
文件结构说明
数据挖掘
数据库架构
程序设计语言
XML
万维网
数据库设计
操作系统
作者
William Spoth,Oliver Kennedy,Ying Lü,Beda Christoph Hammerschmidt,Zhen Hua Liu
标识
DOI:10.1145/3448016.3452801
摘要
Ad-hoc data models like Json simplify schema evolution and enable multiplexing various data sources into a single stream. While useful when writing data, this flexibility makes Json harder to validate and query, forcing such tasks to rely on automated schema discovery techniques. Unfortunately, ambiguity in the schema design space forces existing schema discovery systems to make simplifying, data-independent assumptions about schema structure. When these assumptions are violated, most notably by APIs, the generated schemas are imprecise, creating numerous opportunities for false positives during validation. In this paper, we propose Jxplain, a Json schema discovery algorithm with heuristics that mitigate common forms of ambiguity. Although Jxplain is slightly slower than state of the art schema extractors, we show that it produces significantly more precise schemas.
科研通智能强力驱动
Strongly Powered by AbleSci AI