自动汇总
计算机科学
分类
文档
适应(眼睛)
自然语言处理
工作流程
集合(抽象数据类型)
病历
医疗保健
人工智能
数据科学
医学
心理学
放射科
数据库
神经科学
经济
程序设计语言
经济增长
作者
Dave Van Veen,Cara Van Uden,Louis Blankemeier,Jean-Benoit Delbrouck,Asad Aali,Christian Bluethgen,Anuj Pareek,Malgorzata Polacin,Eduardo Pontes Reis,Anna Seehofnerová,Nidhi Rohatgi,Poonam Hosamani,William Collins,Neera Ahuja,Curtis P. Langlotz,Jason Hom,Sergios Gatidis,John M. Pauly,Akshay Chaudhari
出处
期刊:Nature Medicine
[Springer Nature]
日期:2024-02-27
卷期号:30 (4): 1134-1142
被引量:53
标识
DOI:10.1038/s41591-024-02855-5
摘要
Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP) tasks, their effectiveness on a diverse range of clinical summarization tasks remains unproven. Here we applied adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes and doctor–patient dialogue. Quantitative assessments with syntactic, semantic and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with 10 physicians evaluated summary completeness, correctness and conciseness; in most cases, summaries from our best-adapted LLMs were deemed either equivalent (45%) or superior (36%) compared with summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care. Comparative performance assessment of large language models identified ChatGPT-4 as the best-adapted model across a diverse set of clinical text summarization tasks, and it outperformed 10 medical experts in a reader study.
科研通智能强力驱动
Strongly Powered by AbleSci AI