Improving Entity Linking in Chinese Domain by Sense Embedding Based on Graph Clustering

计算机科学 实体链接 人工智能 自然语言处理 模棱两可 嵌入 聚类分析 弦(物理) 答疑 情报检索 理论计算机科学 知识库 数学 数学物理 程序设计语言
作者
Zhaobo Zhang,Zhi-Man Zhong,Pingpeng Yuan,Hai Jin
出处
期刊:Journal of Computer Science and Technology [Springer Nature]
卷期号:38 (1): 196-210 被引量:5
标识
DOI:10.1007/s11390-023-2835-4
摘要

Entity linking refers to linking a string in a text to corresponding entities in a knowledge base through candidate entity generation and candidate entity ranking. It is of great significance to some NLP (natural language processing) tasks, such as question answering. Unlike English entity linking, Chinese entity linking requires more consideration due to the lack of spacing and capitalization in text sequences and the ambiguity of characters and words, which is more evident in certain scenarios. In Chinese domains, such as industry, the generated candidate entities are usually composed of long strings and are heavily nested. In addition, the meanings of the words that make up industrial entities are sometimes ambiguous. Their semantic space is a subspace of the general word embedding space, and thus each entity word needs to get its exact meanings. Therefore, we propose two schemes to achieve better Chinese entity linking. First, we implement an n-gram based candidate entity generation method to increase the recall rate and reduce the nesting noise. Then, we enhance the corresponding candidate entity ranking mechanism by introducing sense embedding. Considering the contradiction between the ambiguity of word vectors and the single sense of the industrial domain, we design a sense embedding model based on graph clustering, which adopts an unsupervised approach for word sense induction and learns sense representation in conjunction with context. We test the embedding quality of our approach on classical datasets and demonstrate its disambiguation ability in general scenarios. We confirm that our method can better learn candidate entities’ fundamental laws in the industrial domain and achieve better performance on entity linking through experiments.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
xx发布了新的文献求助10
1秒前
包破茧完成签到,获得积分10
1秒前
访云发布了新的文献求助10
1秒前
pride应助serein采纳,获得20
2秒前
4秒前
4秒前
田様应助科研通管家采纳,获得10
4秒前
在水一方应助科研通管家采纳,获得10
4秒前
4秒前
星辰大海应助科研通管家采纳,获得10
4秒前
4秒前
深情安青应助科研通管家采纳,获得10
4秒前
领导范儿应助李龙波采纳,获得10
6秒前
8秒前
方方发布了新的文献求助10
9秒前
暮寻屿苗完成签到 ,获得积分10
9秒前
深情安青应助冰晨采纳,获得10
10秒前
11秒前
橙汁完成签到 ,获得积分10
11秒前
hxnz2001完成签到,获得积分10
11秒前
Agan完成签到,获得积分10
14秒前
依旧是伴发布了新的文献求助10
14秒前
mi给mi的求助进行了留言
14秒前
15秒前
15秒前
深情安青应助tutu采纳,获得10
15秒前
风趣采白完成签到,获得积分10
15秒前
16秒前
苏扶最爱学习了完成签到,获得积分10
16秒前
17秒前
领导范儿应助fangfang050601采纳,获得10
17秒前
深情安青应助贪玩的霸采纳,获得10
18秒前
万能图书馆应助陈龙111111采纳,获得10
19秒前
XIEMIN发布了新的文献求助10
20秒前
20秒前
hxnz2001发布了新的文献求助10
22秒前
22秒前
情怀应助小李找文献采纳,获得30
22秒前
22秒前
南华完成签到 ,获得积分10
22秒前
高分求助中
Evolution 2024
Impact of Mitophagy-Related Genes on the Diagnosis and Development of Esophageal Squamous Cell Carcinoma via Single-Cell RNA-seq Analysis and Machine Learning Algorithms 2000
Experimental investigation of the mechanics of explosive welding by means of a liquid analogue 1060
Die Elektra-Partitur von Richard Strauss : ein Lehrbuch für die Technik der dramatischen Komposition 1000
CLSI EP47 Evaluation of Reagent Carryover Effects on Test Results, 1st Edition 600
大平正芳: 「戦後保守」とは何か 550
Sustainability in ’Tides Chemistry 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3007957
求助须知:如何正确求助?哪些是违规求助? 2667153
关于积分的说明 7234120
捐赠科研通 2304394
什么是DOI,文献DOI怎么找? 1221840
科研通“疑难数据库(出版商)”最低求助积分说明 595342
版权声明 593410