Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT

计算机科学 人工智能 文字嵌入 特征(语言学) 自然语言处理 命名实体识别 人工神经网络 背景(考古学) 特征工程 特征向量 词(群论) 混合神经网络 词汇 深度学习 情报检索 任务(项目管理) 嵌入 经济 管理 古生物学 哲学 生物 语言学
作者
Peng Chen,Meng Zhang,Xiaosheng Yu,Songpu Li
出处
期刊:BMC Medical Informatics and Decision Making [Springer Nature]
卷期号:22 (1) 被引量:4
标识
DOI:10.1186/s12911-022-02059-2
摘要

Abstract Background Named entity recognition (NER) of electronic medical records is an important task in clinical medical research. Although deep learning combined with pretraining models performs well in recognizing entities in clinical texts, because Chinese electronic medical records have a special text structure and vocabulary distribution, general pretraining models cannot effectively incorporate entities and medical domain knowledge into representation learning; separate deep network models lack the ability to fully extract rich features in complex texts, which negatively affects the named entity recognition of electronic medical records. Methods To better represent electronic medical record text, we extract the text’s local features and multilevel sequence interaction information to improve the effectiveness of electronic medical record named entity recognition. This paper proposes a hybrid neural network model based on medical MC-BERT, namely, the MC-BERT + BiLSTM + CNN + MHA + CRF model. First, MC-BERT is used as the word embedding model of the text to obtain the word vector, and then BiLSTM and CNN obtain the feature information of the forward and backward directions of the word vector and the local context to obtain the corresponding feature vector. After merging the two feature vectors, they are sent to multihead self-attention (MHA) to obtain multilevel semantic features, and finally, CRF is used to decode the features and predict the label sequence. Results The experiments show that the F1 values of our proposed hybrid neural network model based on MC-BERT reach 94.22%, 86.47%, and 92.28% on the CCKS-2017, CCKS-2019 and cEHRNER datasets, respectively. Compared with the general-domain BERT-based BiLSTM + CRF, our F1 values increased by 0.89%, 1.65% and 2.63%. Finally, we analyzed the effect of an unbalanced number of entities in the electronic medical records on the results of the NER experiment.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
优雅山柏发布了新的文献求助10
5秒前
深情安青应助TT2022采纳,获得10
8秒前
哈尔行者完成签到,获得积分10
9秒前
稳重的凌柏完成签到,获得积分10
15秒前
嘻哈学习完成签到,获得积分10
21秒前
21秒前
22秒前
22秒前
高贵怀蕾完成签到,获得积分10
25秒前
PeilunLi发布了新的文献求助30
27秒前
无心的初兰完成签到,获得积分10
27秒前
bkagyin应助dustttt采纳,获得10
29秒前
CL发布了新的文献求助10
29秒前
秋秋发布了新的文献求助10
29秒前
玩命的雨泽完成签到,获得积分10
34秒前
37秒前
大胆擎苍发布了新的文献求助10
39秒前
41秒前
41秒前
默默的裘发布了新的文献求助10
44秒前
Lucas应助gaobowang采纳,获得10
46秒前
CL完成签到,获得积分10
46秒前
YC发布了新的文献求助20
48秒前
神说应助euphoria采纳,获得10
49秒前
小蓝人完成签到 ,获得积分10
53秒前
Ysj完成签到,获得积分10
54秒前
54秒前
55秒前
57秒前
gaobowang完成签到,获得积分10
58秒前
酷波er应助禾盒采纳,获得10
1分钟前
我是老大应助Cybars采纳,获得10
1分钟前
旱钮发布了新的文献求助30
1分钟前
TT发布了新的文献求助10
1分钟前
fenmiao完成签到,获得积分20
1分钟前
gaobowang发布了新的文献求助10
1分钟前
1分钟前
天天快乐应助cebr采纳,获得10
1分钟前
夏来应助科研通管家采纳,获得10
1分钟前
星辰大海应助科研通管家采纳,获得10
1分钟前
高分求助中
中国国际图书贸易总公司40周年纪念文集 大事记1949-1987 2000
TM 5-855-1(Fundamentals of protective design for conventional weapons) 1000
草地生态学 880
Threaded Harmony: A Sustainable Approach to Fashion 799
Basic Modern Theory of Linear Complex Analytic 𝑞-Difference Equations 510
Queer Politics in Times of New Authoritarianisms: Popular Culture in South Asia 500
Livre et militantisme : La Cité éditeur 1958-1967 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3058489
求助须知:如何正确求助?哪些是违规求助? 2714529
关于积分的说明 7441097
捐赠科研通 2359812
什么是DOI,文献DOI怎么找? 1250399
科研通“疑难数据库(出版商)”最低求助积分说明 607442
版权声明 596410