计算机科学
命名实体识别
自然语言处理
人工智能
任务(项目管理)
词典
方案(数学)
水准点(测量)
特征(语言学)
分割
语音识别
语言学
数学分析
哲学
数学
管理
大地测量学
经济
地理
作者
Chengcheng Mai,Jian Liu,Mengchuan Qiu,Kaiwen Luo,Ziyan Peng,Chunfeng Yuan,Yihua Huang
标识
DOI:10.1016/j.ipm.2022.103041
摘要
Named Entity Recognition (NER) aims to automatically extract specific entities from the unstructured text. Compared with performing NER in English, Chinese NER is more challenging in recognizing entity boundaries because there are no explicit delimiters between Chinese characters. However, most previous researches focused on the semantic information of the Chinese language on the character level but ignored the importance of the phonetic characteristics. To address these issues, we integrated phonetic features of Chinese characters with the lexicon information to help disambiguate the entity boundary recognition by fully exploring the potential of Chinese as a pictophonetic language. In addition, a novel multi-tagging-scheme learning method was proposed, based on the multi-task learning paradigm, to alleviate the data sparsity and error propagation problems that occurred in the previous tagging schemes, by separately annotating the segmentation information of entities and their corresponding entity types. Extensive experiments performed on four Chinese NER benchmark datasets: OntoNotes4.0, MSRA, Resume, and Weibo, show that our proposed method consistently outperforms the existing state-of-the-art baseline models. The ablation experiments further demonstrated that the introduction of the phonetic feature and the multi-tagging-scheme has a significant positive effect on the improvement of the Chinese NER task.
科研通智能强力驱动
Strongly Powered by AbleSci AI