计算机科学
卷积神经网络
引用
人工智能
二元分类
班级(哲学)
背景(考古学)
特征(语言学)
嵌入
机器学习
模式识别(心理学)
自然语言处理
情报检索
支持向量机
万维网
古生物学
哲学
生物
语言学
作者
Naif Radi Aljohani,Ayman G. Fayoumi,Saeed‐Ul Hassan
标识
DOI:10.1177/0165551521991022
摘要
We argue that citations, as they have different reasons and functions, should not all be treated in the same way. Using the large, annotated dataset of about 10K citation contexts annotated by human experts, extracted from the Association for Computational Linguistics repository, we present a deep learning–based citation context classification architecture. Unlike all existing state-of-the-art feature-based citation classification models, our proposed convolutional neural network (CNN) with fastText-based pre-trained embedding vectors uses only the citation context as its input to outperform them in both binary- (important and non-important) and multi-class (Use, Extends, CompareOrContrast, Motivation, Background, Other) citation classification tasks. Furthermore, we propose using focal-loss and class-weight functions in the CNN model to overcome the inherited class imbalance issues in citation classification datasets. We show that using the focal-loss function with CNN adds a factor of [Formula: see text] to the cross-entropy function. Our model improves on the baseline results by achieving an encouraging 90.6 F1 score with 90.7% accuracy and a 72.3 F1 score with a 72.1% accuracy score, respectively, for binary- and multi-class citation classification tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI