认证
考试(生物学)
康复
多项选择
临床实习
心理学
前庭康复
医学教育
应用心理学
物理医学与康复
医学
物理疗法
显著性差异
法学
古生物学
内科学
生物
政治学
作者
Yael Arbel,Yoav Gimmon,Liora Shmueli
摘要
Abstract Objective This study aimed to compare the performance of 2 large language models, ChatGPT and Google Gemini, against experienced physical therapists and students in responding to multiple-choice questions related to vestibular rehabilitation. The study further aimed to assess the accuracy of ChatGPT’s responses by board-certified otoneurologists. Methods This study was conducted among 30 physical therapist professionals experienced with vestibular rehabilitation and 30 physical therapist students. They were asked to complete a vestibular knowledge test (VKT) consisting of 20 multiple-choice questions that were divided into 3 categories: (1) Clinical Knowledge, (2) Basic Clinical Practice, and (3) Clinical Reasoning. ChatGPT and Google Gemini were tasked with answering the same 20 VKT questions. Three board-certified otoneurologists independently evaluated the accuracy of each response using a 4-level scale, ranging from comprehensive to completely incorrect. Results ChatGPT outperformed Google Gemini with a 70% score on the VKT test, while Gemini scored 60%. Both excelled in Clinical Knowledge scoring 100% but struggled in Clinical Reasoning with ChatGPT scoring 50% and Gemini scoring 25%. According to 3 otoneurologic experts, ChatGPT’s accuracy was considered “comprehensive” in 45% of the 20 questions, while 25% were found to be completely incorrect. ChatGPT provided “comprehensive” responses in 50% of Clinical Knowledge and Basic Clinical Practice questions, but only 25% in Clinical Reasoning. Conclusion Caution is advised when using ChatGPT and Google Gemini due to their limited accuracy in clinical reasoning. While they provide accurate responses concerning Clinical Knowledge, their reliance on web information may lead to inconsistencies. ChatGPT performed better than Gemini. Health care professionals should carefully formulate questions and be aware of the potential influence of the online prevalence of information on ChatGPT’s and Google Gemini’s responses. Combining clinical expertise and clinical guidelines with ChatGPT and Google Gemini can maximize benefits while mitigating limitations. The results are based on current models of ChatGPT3.5 and Google Gemini. Future iterations of these models are expected to offer improved accuracy as the underlying modeling and algorithms are further refined. Impact This study highlights the potential utility of large language models like ChatGPT in supplementing clinical knowledge for physical therapists, while underscoring the need for caution in domains requiring complex clinical reasoning. The findings emphasize the importance of integrating technological tools carefully with human expertise to enhance patient care and rehabilitation outcomes.
科研通智能强力驱动
Strongly Powered by AbleSci AI