计算机科学
人工智能
文字2vec
自然语言处理
命名实体识别
词(群论)
文字嵌入
深度学习
关系抽取
学习迁移
机器学习
作者
Mercedes Arguello-Casteleiro,Nava Maroto,Chris Wroe,Carlos Sevillano Torrado,Cory Henson,Julio Des-Diz,M.J. Fernandez-Prieto,TJ Furmston,Diego Maseda Fernandez,Mohak Kulshrestha,Robert Stevens,John Keane,Simon Peters
标识
DOI:10.1007/978-3-030-91100-3_14
摘要
Deep learning for natural language processing acquires dense vector representations for n-grams from large-scale unstructured corpora. Converting static embeddings of n-grams into a dataset of interlinked concepts with explicit contextual semantic dependencies provides the foundation to acquire reusable knowledge. However, the validation of this knowledge requires cross-checking with ground-truths that may be unavailable in an actionable or computable form. This paper presents a novel approach from the new field of explainable active learning that combines methods for learning static embeddings (word2vec models) with methods for learning dynamic contextual embeddings (transformer-based BERT models). We created a dataset for named entity recognition (NER) and relation extraction (REX) for the Coronavirus Disease 2019 (COVID-19). The COVID-19 dataset has 2,212 associations captured by 11 word2vec models with additional examples of use from the biomedical literature. We propose interpreting the NER and REX tasks for COVID-19 as Question Answering (QA) incorporating general medical knowledge within the question, e.g. “does ‘cough’ (n-gram) belong to ‘clinical presentation/symptoms’ for COVID-19?”. We evaluated biomedical-specific pre-trained language models (BioBERT, SciBERT, ClinicalBERT, BlueBERT, and PubMedBERT) versus general-domain pre-trained language models (BERT, and RoBERTa) for transfer learning with COVID-19 dataset, i.e. task-specific fine-tuning considering NER as a sequence-level task. Using 2,060 QA for training (associations from 10 word2vec models) and 152 QA for validation (associations from 1 word2vec model), BERT obtained an F-measure of 87.38%, with precision = 93.75% and recall = 81.82%. SciBERT achieved the highest F-measure of 94.34%, with precision = 98.04% and recall = 90.91%.
科研通智能强力驱动
Strongly Powered by AbleSci AI