命名实体识别
杠杆(统计)
人工智能
计算机科学
推论
标记数据
背景(考古学)
机器学习
集合(抽象数据类型)
自然语言处理
过程(计算)
训练集
噪音(视频)
数据挖掘
程序设计语言
经济
管理
古生物学
图像(数学)
生物
任务(项目管理)
作者
Zhang Fu,Liangdong Ma,Jiapeng Wang,Jingwei Cheng
摘要
Currently, named entity recognition (NER) is mainly evaluated on standard and well-annotated data sets. However, the construction of a well-annotated data set will consume a lot of manpower and time. In lots of applications of NER, data sets may contain a lot of noise, and a large part of noise comes from unlabeled entities. At present, the training process of most models treat unlabeled entities as nonentities, which causes these models to lean toward predicting most words of an input context as nonentities and greatly affects their performances. In this paper, as the first attempt, we innovatively propose an adaptive positive-unlabeled (adaPU) learning technology, and integrate the adaPU into a machine reading comprehension (MRC) framework for NER, which can still perform well on data sets with a large proportion of unlabeled entities. In our framework, to leverage the above problem that a model may predict most words of an input context as nonentities, we propose an adaPU learning technology by adjusting a loss coefficient of positive and negative samples. Moreover, instead of just constructing a fixed query for each entity type as input to MRC, we propose a new method of dynamically constructing multiple queries for each entity type, which also brings slight performance improvement for NER. Accordingly, we explore new training and entity inference strategies for our learning framework. The experimental results show that our framework is effective on data sets that contain a large number of unlabeled entities. When the proportion of unlabeled entities reaches 50%, our framework still can keep from losing effectiveness and maintain more than 80 F1-scores on several data sets. Also, the experiments show that our framework can achieve better or competitive performance on standard data sets. The ablation experiments further fully demonstrate our MRC framework with adaPU learning and dynamic query construction method can improve the performance of NER.
科研通智能强力驱动
Strongly Powered by AbleSci AI