Extracting symptoms from free-text responses using ChatGPT among COVID-19 cases in Hong Kong

2019年冠状病毒病（COVID-19）匹配（统计）医学严重急性呼吸综合征冠状病毒2型（SARS-CoV-2）听力学自然语言处理疾病内科学计算机科学病理传染病（医学专业）

作者

Wan In Wei,Cyrus Lap Kwan Leung,Arthur Tang,Edward Braddon McNeil,Samuel Yeung Shan Wong,Kin On Kwok

出处

期刊：Clinical Microbiology and Infection [Elsevier BV]
日期：2023-11-08 卷期号：30 (1): 142.e1-142.e3 被引量：9

链接

nih.govdoi.org

标识

DOI：10.1016/j.cmi.2023.11.002

摘要

Abstract

Objectives

To investigate the feasibility and performance of Chat Generative Pretrained Transformer (ChatGPT) in converting symptom narratives into structured symptom labels.

Methods

We extracted symptoms from 300 deidentified symptom narratives of COVID-19 patients by a computer-based matching algorithm (the standard), and prompt engineering in ChatGPT. Common symptoms were those with a prevalence >10% according to the standard, and similarly less common symptoms were those with a prevalence of 2–10%. The precision of ChatGPT was compared with the standard using sensitivity and specificity with 95% exact binomial CIs (95% binCIs). In ChatGPT, we prompted without examples (zero-shot prompting) and with examples (few-shot prompting).

Results

In zero-shot prompting, GPT-4 achieved high specificity (0.947 [95% binCI: 0.894–0.978]—1.000 [95% binCI: 0.965–0.988, 1.000]) for all symptoms, high sensitivity for common symptoms (0.853 [95% binCI: 0.689–0.950]—1.000 [95% binCI: 0.951–1.000]), and moderate sensitivity for less common symptoms (0.200 [95% binCI: 0.043–0.481]—1.000 [95% binCI: 0.590–0.815, 1.000]). Few-shot prompting increased the sensitivity and specificity. GPT-4 outperformed GPT-3.5 in response accuracy and consistent labelling.

Discussion

This work substantiates ChatGPT's role as a research tool in medical fields. Its performance in converting symptom narratives to structured symptom labels was encouraging, saving time and effort in compiling the task-specific training data. It potentially accelerates free-text data compilation and synthesis in future disease outbreaks and improves the accuracy of symptom checkers. Focused prompt training addressing ambiguous descriptions impacts medical research positively.

求助该文献