Pronounce differently, mean differently: A multi-tagging-scheme learning method for Chinese NER integrated with lexicon and phonetic features

计算机科学 命名实体识别 自然语言处理 人工智能 任务(项目管理) 词典 方案(数学) 水准点(测量) 特征(语言学) 分割 语音识别 语言学 数学分析 哲学 经济 管理 数学 地理 大地测量学
作者
Chengcheng Mai,Jian Liu,Mengchuan Qiu,Kaiwen Luo,Ziyan Peng,Chunfeng Yuan,Yihua Huang
出处
期刊:Information Processing and Management [Elsevier]
卷期号:59 (5): 103041-103041 被引量:13
标识
DOI:10.1016/j.ipm.2022.103041
摘要

Named Entity Recognition (NER) aims to automatically extract specific entities from the unstructured text. Compared with performing NER in English, Chinese NER is more challenging in recognizing entity boundaries because there are no explicit delimiters between Chinese characters. However, most previous researches focused on the semantic information of the Chinese language on the character level but ignored the importance of the phonetic characteristics. To address these issues, we integrated phonetic features of Chinese characters with the lexicon information to help disambiguate the entity boundary recognition by fully exploring the potential of Chinese as a pictophonetic language. In addition, a novel multi-tagging-scheme learning method was proposed, based on the multi-task learning paradigm, to alleviate the data sparsity and error propagation problems that occurred in the previous tagging schemes, by separately annotating the segmentation information of entities and their corresponding entity types. Extensive experiments performed on four Chinese NER benchmark datasets: OntoNotes4.0, MSRA, Resume, and Weibo, show that our proposed method consistently outperforms the existing state-of-the-art baseline models. The ablation experiments further demonstrated that the introduction of the phonetic feature and the multi-tagging-scheme has a significant positive effect on the improvement of the Chinese NER task.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Lucas应助mawanyu采纳,获得20
刚刚
1秒前
1秒前
2秒前
2秒前
FashionBoy应助独特的晓凡采纳,获得10
2秒前
2秒前
3秒前
4秒前
kevinqpp发布了新的文献求助10
5秒前
5秒前
瑞文发布了新的文献求助10
5秒前
00发布了新的文献求助10
6秒前
可爱的函函应助Two-Capitals采纳,获得10
6秒前
XIAOBAI发布了新的文献求助10
7秒前
无花果应助buno采纳,获得10
7秒前
waa完成签到 ,获得积分10
7秒前
鹿鹿发布了新的文献求助10
7秒前
benben完成签到 ,获得积分10
7秒前
傅宝完成签到,获得积分10
8秒前
8秒前
9秒前
9秒前
9秒前
momo发布了新的文献求助10
9秒前
明亮的碧完成签到,获得积分10
11秒前
12秒前
12秒前
久久丫发布了新的文献求助30
13秒前
ww完成签到 ,获得积分10
14秒前
NexusExplorer应助再坚持一点采纳,获得10
15秒前
lingdu完成签到,获得积分20
15秒前
15秒前
16秒前
17秒前
小波波波完成签到,获得积分10
17秒前
东海发布了新的文献求助10
17秒前
17秒前
王星星发布了新的文献求助10
19秒前
天真枫发布了新的文献求助10
19秒前
高分求助中
Modern Epidemiology, Fourth Edition 5000
Kinesiophobia : a new view of chronic pain behavior 5000
Molecular Biology of Cancer: Mechanisms, Targets, and Therapeutics 3000
Digital Twins of Advanced Materials Processing 2000
Propeller Design 2000
Weaponeering, Fourth Edition – Two Volume SET 2000
Handbook of pharmaceutical excipients, Ninth edition 1500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 化学工程 生物化学 物理 计算机科学 内科学 复合材料 催化作用 物理化学 光电子学 电极 冶金 细胞生物学 基因
热门帖子
关注 科研通微信公众号,转发送积分 6011475
求助须知:如何正确求助?哪些是违规求助? 7561281
关于积分的说明 16136985
捐赠科研通 5158233
什么是DOI,文献DOI怎么找? 2762695
邀请新用户注册赠送积分活动 1741467
关于科研通互助平台的介绍 1633653