计算机科学
判决
语言模型
规范化(社会学)
F1得分
人工智能
领域(数学分析)
水准点(测量)
统一医学语言系统
自然语言处理
病历
命名实体识别
情报检索
医学
数学分析
数学
管理
大地测量学
社会学
人类学
经济
放射科
任务(项目管理)
地理
作者
Yuting Zou,Peng Zhang,Yunchao Ling,Daqing Lv,Ziming Li,Lu Shen,Guoqing Zhang
标识
DOI:10.1109/bibm58861.2023.10386068
摘要
Accurately extracting and classifying Chinese electronic medical record (EMR), which contain huge amounts of valuable medical information, have promising practical application and medical value in the health care of China. While the pivotal issue has gathered escalating attention, the bulk of current research is directed towards operations conducted at the document or entity level within medical records. Only a restricted body of work addresses these concerns at the sentence level, a critical aspect for downstream tasks like medical information retrieval, diagnosis normalization, and question answering. In this paper, we present a domain adaptive pre-training language model named CEMR-LM for sentence classification of Chinese EMRs. CEMR-LM acquires Chinese medical domain knowledge through the utilization of copious unlabeled clinical corpus for pre-training the language model. This is fortified by combining fine-tuning strategy and a dual-channel mechanism, which collectively contribute to the model’s heightened performance. Experiments on the benchmark dataset and real world hospital dataset both demonstrate that CEMR-LM is superior to the state-of-the-art methods. Furthermore, CEMR-LM possesses the capability to elucidate indicative elements within medical records by visualizing of the attention weights embedded within the model. The implemented code and experimental datasets are available online at https://github.com/BioMedBigDataCenter/CEMR-LM.
科研通智能强力驱动
Strongly Powered by AbleSci AI