计算机科学
学习迁移
人工智能
编码(社会科学)
无监督学习
机器学习
医学分类
源代码
数据建模
数据挖掘
数据库
数学
医学
统计
操作系统
护理部
作者
Amit Kumar,Souparna Das,Suman Roy
标识
DOI:10.1109/icdh60066.2023.00047
摘要
In healthcare industry, it is a standard practice to assign a set of International Classification of Diseases (ICD) to a clinical note (which can be a patient visit, a discharge summary and the like) as part of medical coding process mandated by medical care and patient billing. A supervised framework is adopted for most of the automated ICD coding assignment methods in which a subset of the clinical notes are a-priori labeled with ICD codes. But in lot of cases enough labeled texts are not available. These call for an unsupervised assignment of ICD codes. However, the quality of the data plays an important role in the performance of unsupervised coding, - low quality data leads to degradation of performance. In this paper, we explore a transfer learning approach for ICD coding using a combination of pre-training and supervised fine-tuning. We use a hierarchical BERT model comprising of a Bi-LSTM layered on top of BERT (this removes the restriction on the size of clinical texts)) as part of model architecture, and pre-train it on the total corpus (which include both labeled and unlabeled data). Next we transfer its weights to fine tune the model with labeled data (MIMIC data) in a supervised framework and then use this model to predict ICD code for unlabeled data using token similarity. This is the first use of using transfer learning in ICD prediction to our knowledge. Finally we show the efficacy of our transfer learning approach through rigorous experimentation, - there is 20% gain of sensitivity (recall) and 6% lift in specificity in ICD prediction compared to direct unsupervised prediction.
科研通智能强力驱动
Strongly Powered by AbleSci AI