Evaluating the positive predictive value of code-based identification of cirrhosis and its complications utilizing GPT-4

肝硬化医学金标准（测试）腹水肝性脑病自发性细菌性腹膜炎图表内科学算法人工智能机器学习胃肠病学计算机科学统计数学

作者

Aryana T. Far,Asal Bastani,Albert Lee,Oksana Gologorskaya,Chiung‐Yu Huang,Mark J. Pletcher,Jennifer C. Lai,Jin Ge

出处

期刊：Hepatology [Wiley]
日期：2024-10-08

链接

nih.govdoi.org

标识

DOI：10.1097/hep.0000000000001115

摘要

Background and Aims: Diagnosis code classification is a common method for cohort identification in cirrhosis research, but it is often inaccurate and augmented by labor-intensive chart review. Natural language processing using large language models (LLMs) is a potentially more accurate method. To assess LLMs’ potential for cirrhosis cohort identification, we compared code-based versus LLM-based classification with chart review as a “gold standard.” Approach and Results: We extracted and conducted a limited chart review of 3788 discharge summaries of cirrhosis admissions. We engineered zero-shot prompts using a Generative Pre-trained Transformer 4 to determine whether cirrhosis and its complications were active hospitalization problems. We calculated positive predictive values (PPVs) of LLM-based classification versus limited chart review and PPVs of code-based versus LLM-based classification as a “silver standard” in all 3788 summaries. Compared to gold standard chart review, code-based classification achieved PPVs of 82.2% for identifying cirrhosis, 41.7% for HE, 72.8% for ascites, 59.8% for gastrointestinal bleeding, and 48.8% for spontaneous bacterial peritonitis. Compared to the chart review, Generative Pre-trained Transformer 4 achieved 87.8%–98.8% accuracies for identifying cirrhosis and its complications. Using LLM as a silver standard, code-based classification achieved PPVs of 79.8% for identifying cirrhosis, 53.9% for HE, 55.3% for ascites, 67.6% for gastrointestinal bleeding, and 65.5% for spontaneous bacterial peritonitis. Conclusions: LLM-based classification was highly accurate versus manual chart review in identifying cirrhosis and its complications. This allowed us to assess the performance of code-based classification at scale using LLMs as a silver standard. These results suggest LLMs could augment or replace code-based cohort classification and raise questions regarding the necessity of chart review.

求助该文献

最长约 10秒，即可获得该文献文件

Evaluating the positive predictive value of code-based identification of cirrhosis and its complications utilizing GPT-4

今日热心研友