计算机科学
决策支持系统
临床决策支持系统
人工智能
作者
Gianluca Mondillo,Simone Colosimo,Alessandra Perrotta,Vittoria Frattolillo,Mariapia Masino
出处
期刊:Cold Spring Harbor Laboratory - medRxiv
日期:2025-01-28
标识
DOI:10.1101/2025.01.27.25321169
摘要
Introduction: The adoption of advanced reasoning models, such as ChatGPT O1 and DeepSeek-R1, represents a pivotal step forward in clinical decision support, particularly in pediatrics. ChatGPT O1 employs "chain-of-thought reasoning" (CoT) to enhance structured problem-solving, while DeepSeek-R1 introduces self-reflection capabilities through reinforcement learning. This study aimed to evaluate the diagnostic accuracy and clinical utility of these models in pediatric scenarios using the MedQA dataset. Materials and Methods: A total of 500 multiple-choice pediatric questions from the MedQA dataset were presented to ChatGPT O1 and DeepSeek-R1. Each question included four or more options, with one correct answer. The models were evaluated under uniform conditions, with performance metrics including accuracy, Cohen's Kappa, and chi-square tests applied to assess agreement and statistical significance. Responses were analyzed to determine the models effectiveness in addressing clinical questions. Results: ChatGPT O1 achieved a diagnostic accuracy of 92.8%, significantly outperforming DeepSeek-R1, which scored 87.0% (p < 0.00001). The CoT reasoning technique used by ChatGPT O1 allowed for more structured and reliable responses, reducing the risk of errors. Conversely, DeepSeek-R1, while slightly less accurate, demonstrated superior accessibility and adaptability due to its open-source nature and emerging self-reflection capabilities. Cohen's Kappa (K=0.20) indicated low agreement between the models, reflecting their distinct reasoning strategies. Conclusions: This study highlights the strengths of ChatGPT O1 in providing accurate and coherent clinical reasoning, making it highly suitable for critical pediatric scenarios. DeepSeek-R1, with its flexibility and accessibility, remains a valuable tool in resource-limited settings. Combining these models in an ensemble system could leverage their complementary strengths, optimizing decision support in diverse clinical contexts. Further research is warranted to explore their integration into multidisciplinary care teams and their application in real-world clinical settings.
科研通智能强力驱动
Strongly Powered by AbleSci AI