重症监护
新生儿重症监护室
医学
临床决策支持系统
临床实习
临床判断
杠杆(统计)
人工智能
决策支持系统
医学物理学
护理部
重症监护医学
计算机科学
儿科
作者
Chedva Levin,Tehilla Kagan,Shani Rosen,Mor Saban
标识
DOI:10.1016/j.ijnurstu.2024.104771
摘要
To assess the clinical reasoning capabilities of two large language models, ChatGPT-4 and Claude-2.0, compared to those of neonatal nurses during neonatal care scenarios. A cross-sectional study with a comparative evaluation using a survey instrument that included six neonatal intensive care unit clinical scenarios. 32 neonatal intensive care nurses with 5–10 years of experience working in the neonatal intensive care units of three medical centers. Participants responded to 6 written clinical scenarios. Simultaneously, we asked ChatGPT-4 and Claude-2.0 to provide initial assessments and treatment recommendations for the same scenarios. The responses from ChatGPT-4 and Claude-2.0 were then scored by certified neonatal nurse practitioners for accuracy, completeness, and response time. Both models demonstrated capabilities in clinical reasoning for neonatal care, with Claude-2.0 significantly outperforming ChatGPT-4 in clinical accuracy and speed. However, limitations were identified across the cases in diagnostic precision, treatment specificity, and response lag. While showing promise, current limitations reinforce the need for deep refinement before ChatGPT-4 and Claude-2.0 can be considered for integration into clinical practice. Additional validation of these tools is important to safely leverage this Artificial Intelligence technology for enhancing clinical decision-making. The study provides an understanding of the reasoning accuracy of new Artificial Intelligence models in neonatal clinical care. The current accuracy gaps of ChatGPT-4 and Claude-2.0 need to be addressed prior to clinical usage.
科研通智能强力驱动
Strongly Powered by AbleSci AI