计算机科学
临床决策
变量(数学)
放射科
医学
家庭医学
数学
数学分析
作者
Daniel Nguyen,Daniel R. Swanson,Alex Newbury,Young H. Kim
标识
DOI:10.1016/j.acra.2023.11.002
摘要
Large language models (LLMs) such as ChatGPT and Bard have emerged as powerful tools in medicine, showcasing strong results in tasks such as radiology report translations and research paper drafting. While their implementation in clinical practice holds promise, their response accuracy remains variable. This study aimed to evaluate the accuracy of ChatGPT and Bard in clinical decision-making based on the American College of Radiology Appropriateness Criteria for various cancers. Both LLMs were evaluated in terms of their responses to open-ended (OE) and select-all-that-apply (SATA) prompts. Furthermore, the study incorporated prompt engineering (PE) techniques to enhance the accuracy of LLM outputs. The results revealed similar performances between ChatGPT and Bard on OE prompts, with ChatGPT exhibiting marginally higher accuracy in SATA scenarios. The introduction of PE also marginally improved LLM outputs in OE prompts but did not enhance SATA responses. The results highlight the potential of LLMs in aiding clinical decision-making processes, especially when guided by optimally engineered prompts. Future studies in diverse clinical situations are imperative to better understand the impact of LLMs in radiology. Large language models (LLMs) such as ChatGPT and Bard have emerged as powerful tools in medicine, showcasing strong results in tasks such as radiology report translations and research paper drafting. While their implementation in clinical practice holds promise, their response accuracy remains variable. This study aimed to evaluate the accuracy of ChatGPT and Bard in clinical decision-making based on the American College of Radiology Appropriateness Criteria for various cancers. Both LLMs were evaluated in terms of their responses to open-ended (OE) and select-all-that-apply (SATA) prompts. Furthermore, the study incorporated prompt engineering (PE) techniques to enhance the accuracy of LLM outputs. The results revealed similar performances between ChatGPT and Bard on OE prompts, with ChatGPT exhibiting marginally higher accuracy in SATA scenarios. The introduction of PE also marginally improved LLM outputs in OE prompts but did not enhance SATA responses. The results highlight the potential of LLMs in aiding clinical decision-making processes, especially when guided by optimally engineered prompts. Future studies in diverse clinical situations are imperative to better understand the impact of LLMs in radiology.
科研通智能强力驱动
Strongly Powered by AbleSci AI