Exploring Self-Distillation Based Relational Reasoning Training for Document-Level Relation Extraction

计算机科学关系（数据库）人工智能关系数据库特征（语言学）关系模型过程（计算）关系抽取统计关系学习自然语言处理数据挖掘机器学习语言学哲学操作系统

作者

Liang Zhang,Jinsong Su,Zijun Min,Zhongjian Miao,Qingguo Hu,Biao Fu,Xiaodong Shi,Yidong Chen

出处

期刊：Proceedings of the ... AAAI Conference on Artificial Intelligence [Association for the Advancement of Artificial Intelligence (AAAI)]
日期：2023-06-26 卷期号：37 (11): 13967-13975 被引量：5

链接

aaai.org aaai.orgdoi.org

标识

DOI：10.1609/aaai.v37i11.26635

摘要

Document-level relation extraction (RE) aims to extract relational triples from a document. One of its primary challenges is to predict implicit relations between entities, which are not explicitly expressed in the document but can usually be extracted through relational reasoning. Previous methods mainly implicitly model relational reasoning through the interaction among entities or entity pairs. However, they suffer from two deficiencies: 1) they often consider only one reasoning pattern, of which coverage on relational triples is limited; 2) they do not explicitly model the process of relational reasoning. In this paper, to deal with the first problem, we propose a document-level RE model with a reasoning module that contains a core unit, the reasoning multi-head self-attention unit. This unit is a variant of the conventional multi-head self-attention and utilizes four attention heads to model four common reasoning patterns, respectively, which can cover more relational triples than previous methods. Then, to address the second issue, we propose a self-distillation training framework, which contains two branches sharing parameters. In the first branch, we first randomly mask some entity pair feature vectors in the document, and then train our reasoning module to infer their relations by exploiting the feature information of other related entity pairs. By doing so, we can explicitly model the process of relational reasoning. However, because the additional masking operation is not used during testing, it causes an input gap between training and testing scenarios, which would hurt the model performance. To reduce this gap, we perform conventional supervised training without masking operation in the second branch and utilize Kullback-Leibler divergence loss to minimize the difference between the predictions of the two branches. Finally, we conduct comprehensive experiments on three benchmark datasets, of which experimental results demonstrate that our model consistently outperforms all competitive baselines. Our source code is available at https://github.com/DeepLearnXMU/DocRE-SD

求助该文献

最长约 10秒，即可获得该文献文件

Exploring Self-Distillation Based Relational Reasoning Training for Document-Level Relation Extraction

今日热心研友