发布文献求助

Schema Matching using Pre-Trained Language Models

模式匹配计算机科学模式（遗传算法）数据库架构自然语言处理人工智能匹配（统计）情报检索自然语言机器学习数据挖掘数据集成数据库设计统计数学

作者

Yunjia Zhang,Avrilia Floratou,Joyce Cahoon,Subru Krishnan,Andreas Müller,Dalitso Banda,Fotis Psallidas,Jignesh M. Patel

标识

DOI：10.1109/icde55515.2023.00123

摘要

Schema matching over relational data has been studied for more than two decades. However, the state-of-the-art methods do not address key modern-day challenges encountered in real customer scenarios, namely: 1) no access to the source (customer) data due to privacy constraints, 2) target schema with a much larger number of entities and attributes compared to the source schema, and 3) different but semantically equivalent entity and attribute names in the source and target schemata. In this paper, we address these shortcomings. Using real-world customer schemata, we demonstrate that existing linguistic matching approaches have low accuracy. Next, we propose the Learned Schema Mapper (LSM), a novel linguistic schema matching system that leverages the natural language understanding capabilities of pre-trained language models to improve the overall accuracy. Combining this with active learning and a smart attribute selection strategy that selects the most informative attributes for users to label, LSM can significantly reduce the overall human labeling cost. Experimental results demonstrate that users can correctly match their full schema while saving as much as 81% of the labeling cost compared to manual labeling.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

活动

『应助活动周』获奖名单已公布 🔥 (2025-4-2)

更新

『中科院2025期刊分区』已更新 (2025-3-23)

更新

『即时热点』模块已上线 (2025-2-28)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: 达叔发布了新的文献求助10

刚刚; 科研通AI5的应助被小小科研人采纳，获得10

刚刚; ding的应助被小高采纳，获得10

1秒前; wesley发布了新的文献求助100

1秒前; hlh上传了应助文件

2秒前; 烟花上传了应助文件

3秒前; 希望天下0贩的0上传了应助文件

3秒前; 光亮妙之完成签到，获得积分10

4秒前; 新世界的蜗牛完成签到，获得积分10

4秒前; 比白618完成签到，获得积分10

6秒前; wshwx驳回了田様的应助

7秒前; 端端小跟班发布了新的文献求助10

7秒前; 早爹完成签到，获得积分10

7秒前; 顾矜上传了应助文件

8秒前; Jasper的应助被奋斗映寒采纳，获得10

8秒前; 王淇茜发布了新的文献求助10

8秒前; 天天发布了新的文献求助10

9秒前; 猜猜完成签到，获得积分10

10秒前; Erina完成签到，获得积分10

10秒前; 大个上传了应助文件

10秒前; 入海完成签到，获得积分10

10秒前; 科研通AI5的应助被jeep先生采纳，获得10

10秒前; NexusExplorer上传了应助文件

11秒前; 酷波er的应助被ll200207采纳，获得10

12秒前; CLF发布了新的文献求助10

12秒前; yiyi131关闭了yiyi131的文献求助

12秒前; 搞怪羊完成签到，获得积分20

13秒前; 在水一方的应助被qd采纳，获得10

13秒前; 一一发布了新的文献求助10

13秒前; 充电宝的应助被anna521212采纳，获得20

14秒前; 不想看文献发布了新的文献求助10

15秒前; Quan发布了新的文献求助10

16秒前; wang发布了新的文献求助10

16秒前; wasiwan完成签到，获得积分10

17秒前; Xxi完成签到，获得积分10

19秒前; 赘婿上传了应助文件

20秒前; CLF完成签到，获得积分10

20秒前; 慕青上传了应助文件

21秒前; 火星上的芹菜完成签到，获得积分10

21秒前; 斯文败类的应助被qiyr采纳，获得10

22秒前

高分求助中: 【此为提示信息，请勿应助】请按要求发布求助，避免被关 20000; Continuum Thermodynamics and Material Modelling 2000; ISCN 2024 – An International System for Human Cytogenomic Nomenclature (2024) 1000; CRC Handbook of Chemistry and Physics 104th edition 1000; Izeltabart tapatansine - AdisInsight 600; Maneuvering of a Damaged Navy Combatant 500; An International System for Human Cytogenomic Nomenclature (2024) 500

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 3769651; 求助须知：如何正确求助？哪些是违规求助？ 3314720; 关于积分的说明 10173463; 捐赠科研通 3030075; 什么是DOI，文献DOI怎么找？ 1662585; 邀请新用户注册赠送积分活动 795040; 科研通“疑难数据库（出版商）”最低求助积分说明 756519

今日热心研友

昏睡的蟠桃

文献看不懂

科研小民工

眯眯眼的衬衫

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2025 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：941272744【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通