计算机科学
编码(社会科学)
水准点(测量)
宏
人工智能
任务(项目管理)
ICD-10号
机器学习
数据挖掘
自然语言处理
模式识别(心理学)
工程类
医学
统计
数学
大地测量学
系统工程
精神科
程序设计语言
地理
作者
Junping Liu,Shichen Yang,Tao Peng,Xinrong Hu,Qiang Zhu
标识
DOI:10.1109/bibm58861.2023.10385482
摘要
Automated International Classification of Diseases (ICD) coding involves the automated assignment of diverse disease codes to clinical medical texts. It is considered as a multi-label classification task. Because most ICD codes are rare, the imbalanced distribution and small sample size issue make this task challenging. Inspired by the recent success of ChatGPT and prompt-based fine-tuning, this study proposes a model called ChatICD to address the issue of few-shot ICD coding. First, ChatGPT for data augumentation rephrases the descriptions of ICD codes into multiple samples. Then, ChatICD fine-tunes the pretrained model by generating prompt templates and label mapping words. We conduct an evaluation of ChatICD on benchmark datasets, namely MIMIC-III-50 and MIMIC-III-rare50. On the few-shot ICD coding task of MIMIC-III-rare50, ChatICD achieves macro-F1 and micro-F1 of 35.8% and 38.2% respectively, which is a 5.4% and 5.6% improvement over the current best model.
科研通智能强力驱动
Strongly Powered by AbleSci AI