命名实体识别
计算机科学
水准点(测量)
自然语言处理
人工智能
领域(数学分析)
语言模型
编码(集合论)
训练集
标记数据
机器学习
程序设计语言
集合(抽象数据类型)
任务(项目管理)
地理
管理
大地测量学
经济
数学分析
数学
作者
Liang Chen,Yue Yu,Haoming Jiang,Siawpeng Er,Ruijia Wang,Tuo Zhao,Chao Zhang
标识
DOI:10.1145/3394486.3403149
摘要
We study the open-domain named entity recognition (NER) problem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly incomplete and noisy distant labels via external knowledge bases. To address this challenge, we propose a new computational framework -- BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. Specifically, we propose a two-stage training algorithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels, which can significantly improve the recall and precision; In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance. Thorough experiments on 5 benchmark datasets demonstrate the superiority of BOND over existing distantly supervised NER methods. The code and distantly labeled data have been released in https://github.com/cliang1453/BOND.
科研通智能强力驱动
Strongly Powered by AbleSci AI