肝硬化
医学
金标准(测试)
腹水
肝性脑病
自发性细菌性腹膜炎
图表
内科学
算法
人工智能
机器学习
胃肠病学
计算机科学
统计
数学
作者
Aryana Far,Asal Bastani,Albert Lee,Oksana Gologorskaya,Chiung‐Yu Huang,Mark J. Pletcher,Jennifer C. Lai,Jin Ge
标识
DOI:10.1097/hep.0000000000001115
摘要
Background: Diagnosis code classification is a common method for cohort identification in cirrhosis research, but it is often inaccurate and augmented by labor-intensive chart review. Natural language processing (NLP) using large language models (LLMs) is a potentially more accurate method. To assess LLMs’ potential for cirrhosis cohort identification, we compared code-based versus LLM-based classification with chart review as a “gold standard.” Methods: We extracted and conducted a limited chart review of 3,788 discharge summaries of cirrhosis admissions. We engineered zero-shot prompts using Generative Pre-trained Transformer (GPT)-4 to determine whether cirrhosis and its complications were active hospitalization problems. We calculated positive predictive values (PPVs) of LLM-based classification versus limited chart review, and PPVs of code-based versus LLM-based classification as a “silver standard” in all 3,788 summaries. Results: Versus gold standard chart review, code-based classification achieved PPVs of 82.2% for identifying cirrhosis, 41.7% hepatic encephalopathy, 72.8% ascites, 59.8% gastrointestinal bleeding, and 48.8% spontaneous bacterial peritonitis. Compared to chart review, GPT-4 achieved 87.8-98.8% accuracies for identifying cirrhosis and its complications. Using LLM as a silver standard, code-based classification achieved PPVs of 79.8% for identifying cirrhosis, 53.9% hepatic encephalopathy, 55.3% ascites, 67.6% gastrointestinal bleeding, and 65.5% spontaneous bacterial peritonitis. Conclusions: LLM-based classification was highly accurate versus manual chart review in identifying cirrhosis and its complications – this allowed us to assess the performance of code-based classification at scale using LLMs as a silver standard. These results suggest LLMs could augment or replace code-based cohort classification and raise questions regarding the necessity of chart review.
科研通智能强力驱动
Strongly Powered by AbleSci AI