An MRC and adaptive positive‐unlabeled learning framework for incompletely labeled named entity recognition

命名实体识别杠杆（统计）人工智能计算机科学推论标记数据背景（考古学）机器学习集合（抽象数据类型）自然语言处理过程（计算）训练集噪音（视频）数据挖掘程序设计语言经济管理古生物学图像（数学）生物任务（项目管理）

作者

Zhang Fu,Liangdong Ma,Jiapeng Wang,Jingwei Cheng

出处

期刊：International Journal of Intelligent Systems [Wiley]
日期：2022-08-22 卷期号：37 (11): 9580-9597 被引量：1

链接

doi.orgdoi.org

标识

DOI：10.1002/int.23015

摘要

Currently, named entity recognition (NER) is mainly evaluated on standard and well-annotated data sets. However, the construction of a well-annotated data set will consume a lot of manpower and time. In lots of applications of NER, data sets may contain a lot of noise, and a large part of noise comes from unlabeled entities. At present, the training process of most models treat unlabeled entities as nonentities, which causes these models to lean toward predicting most words of an input context as nonentities and greatly affects their performances. In this paper, as the first attempt, we innovatively propose an adaptive positive-unlabeled (adaPU) learning technology, and integrate the adaPU into a machine reading comprehension (MRC) framework for NER, which can still perform well on data sets with a large proportion of unlabeled entities. In our framework, to leverage the above problem that a model may predict most words of an input context as nonentities, we propose an adaPU learning technology by adjusting a loss coefficient of positive and negative samples. Moreover, instead of just constructing a fixed query for each entity type as input to MRC, we propose a new method of dynamically constructing multiple queries for each entity type, which also brings slight performance improvement for NER. Accordingly, we explore new training and entity inference strategies for our learning framework. The experimental results show that our framework is effective on data sets that contain a large number of unlabeled entities. When the proportion of unlabeled entities reaches 50%, our framework still can keep from losing effectiveness and maintain more than 80 F1-scores on several data sets. Also, the experiments show that our framework can achieve better or competitive performance on standard data sets. The ablation experiments further fully demonstrate our MRC framework with adaPU learning and dynamic query construction method can improve the performance of NER.

求助该文献

最长约 10秒，即可获得该文献文件

An MRC and adaptive positive‐unlabeled learning framework for incompletely labeled named entity recognition

今日热心研友