医学
集合(抽象数据类型)
回顾性队列研究
接收机工作特性
射线照相术
机器学习
放射科
计算机科学
病理
内科学
程序设计语言
作者
Pritam Mukherjee,Benjamin Hou,Ricardo Bigolin Lanfredi,Ronald M. Summers
出处
期刊:Radiology
[Radiological Society of North America]
日期:2023-10-01
卷期号:309 (1)
被引量:29
标识
DOI:10.1148/radiol.231147
摘要
Background Large language models (LLMs) such as ChatGPT, though proficient in many text-based tasks, are not suitable for use with radiology reports due to patient privacy constraints. Purpose To test the feasibility of using an alternative LLM (Vicuna-13B) that can be run locally for labeling radiography reports. Materials and Methods Chest radiography reports from the MIMIC-CXR and National Institutes of Health (NIH) data sets were included in this retrospective study. Reports were examined for 13 findings. Outputs reporting the presence or absence of the 13 findings were generated by Vicuna by using a single-step or multistep prompting strategy (prompts 1 and 2, respectively). Agreements between Vicuna outputs and CheXpert and CheXbert labelers were assessed using Fleiss κ. Agreement between Vicuna outputs from three runs under a hyperparameter setting that introduced some randomness (temperature, 0.7) was also assessed. The performance of Vicuna and the labelers was assessed in a subset of 100 NIH reports annotated by a radiologist with use of area under the receiver operating characteristic curve (AUC). Results A total of 3269 reports from the MIMIC-CXR data set (median patient age, 68 years [IQR, 59–79 years]; 161 male patients) and 25 596 reports from the NIH data set (median patient age, 47 years [IQR, 32–58 years]; 1557 male patients) were included. Vicuna outputs with prompt 2 showed, on average, moderate to substantial agreement with the labelers on the MIMIC-CXR (κ median, 0.57 [IQR, 0.45–0.66] with CheXpert and 0.64 [IQR, 0.45–0.68] with CheXbert) and NIH (κ median, 0.52 [IQR, 0.41–0.65] with CheXpert and 0.55 [IQR, 0.41–0.74] with CheXbert) data sets, respectively. Vicuna with prompt 2 performed at par (median AUC, 0.84 [IQR, 0.74–0.93]) with both labelers on nine of 11 findings. Conclusion In this proof-of-concept study, outputs of the LLM Vicuna reporting the presence or absence of 13 findings on chest radiography reports showed moderate to substantial agreement with existing labelers. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Cai in this issue.
科研通智能强力驱动
Strongly Powered by AbleSci AI