Evaluating the Potential of Large Language Models for Vestibular Rehabilitation Education: A Comparison of ChatGPT, Google Gemini, and Clinicians

认证考试（生物学）康复多项选择临床实习心理学前庭康复医学教育应用心理学物理医学与康复医学物理疗法显著性差异法学古生物学内科学生物政治学

作者

Yael Arbel,Yoav Gimmon,Liora Shmueli

出处

期刊：Physical therapy [Oxford University Press]
日期：2025-02-11

链接

nih.govdoi.org

标识

DOI：10.1093/ptj/pzaf010

摘要

Abstract Objective This study aimed to compare the performance of 2 large language models, ChatGPT and Google Gemini, against experienced physical therapists and students in responding to multiple-choice questions related to vestibular rehabilitation. The study further aimed to assess the accuracy of ChatGPT’s responses by board-certified otoneurologists. Methods This study was conducted among 30 physical therapist professionals experienced with vestibular rehabilitation and 30 physical therapist students. They were asked to complete a vestibular knowledge test (VKT) consisting of 20 multiple-choice questions that were divided into 3 categories: (1) Clinical Knowledge, (2) Basic Clinical Practice, and (3) Clinical Reasoning. ChatGPT and Google Gemini were tasked with answering the same 20 VKT questions. Three board-certified otoneurologists independently evaluated the accuracy of each response using a 4-level scale, ranging from comprehensive to completely incorrect. Results ChatGPT outperformed Google Gemini with a 70% score on the VKT test, while Gemini scored 60%. Both excelled in Clinical Knowledge scoring 100% but struggled in Clinical Reasoning with ChatGPT scoring 50% and Gemini scoring 25%. According to 3 otoneurologic experts, ChatGPT’s accuracy was considered “comprehensive” in 45% of the 20 questions, while 25% were found to be completely incorrect. ChatGPT provided “comprehensive” responses in 50% of Clinical Knowledge and Basic Clinical Practice questions, but only 25% in Clinical Reasoning. Conclusion Caution is advised when using ChatGPT and Google Gemini due to their limited accuracy in clinical reasoning. While they provide accurate responses concerning Clinical Knowledge, their reliance on web information may lead to inconsistencies. ChatGPT performed better than Gemini. Health care professionals should carefully formulate questions and be aware of the potential influence of the online prevalence of information on ChatGPT’s and Google Gemini’s responses. Combining clinical expertise and clinical guidelines with ChatGPT and Google Gemini can maximize benefits while mitigating limitations. The results are based on current models of ChatGPT3.5 and Google Gemini. Future iterations of these models are expected to offer improved accuracy as the underlying modeling and algorithms are further refined. Impact This study highlights the potential utility of large language models like ChatGPT in supplementing clinical knowledge for physical therapists, while underscoring the need for caution in domains requiring complex clinical reasoning. The findings emphasize the importance of integrating technological tools carefully with human expertise to enhance patient care and rehabilitation outcomes.

求助该文献

Evaluating the Potential of Large Language Models for Vestibular Rehabilitation Education: A Comparison of ChatGPT, Google Gemini, and Clinicians

今日热心研友