克朗巴赫阿尔法
可靠性(半导体)
利克特量表
心理学
考试(生物学)
聊天机器人
有效性
内容有效性
应用心理学
计算机科学
人工智能
临床心理学
心理测量学
功率(物理)
物理
量子力学
发展心理学
古生物学
生物
作者
A Johnson,Tarun Kumar Singh,Aakash Gupta,Hariram Sankar,Ikroop Gill,M. Shalini,N. Mohan
摘要
ABSTRACT Aim This study aimed to assess the validity and reliability of AI chatbots, including Bing, ChatGPT 3.5, Google Gemini, and Claude AI, in addressing frequently asked questions (FAQs) related to dental trauma. Methodology A set of 30 FAQs was initially formulated by collecting responses from four AI chatbots. A panel comprising expert endodontists and maxillofacial surgeons then refined these to a final selection of 20 questions. Each question was entered into each chatbot three times, generating a total of 240 responses. These responses were evaluated using the Global Quality Score (GQS) on a 5‐point Likert scale (5: strongly agree; 4: agree; 3: neutral; 2: disagree; 1: strongly disagree). Any disagreements in scoring were resolved through evidence‐based discussions. The validity of the responses was determined by categorizing them as valid or invalid based on two thresholds: a low threshold (scores of ≥ 4 for all three responses) and a high threshold (scores of 5 for all three responses). A chi‐squared test was used to compare the validity of the responses between the chatbots. Cronbach's alpha was calculated to assess the reliability by evaluating the consistency of repeated responses from each chatbot. Conclusion The results indicate that the Claude AI chatbot demonstrated superior validity and reliability compared to ChatGPT and Google Gemini, whereas Bing was found to be less reliable. These findings underscore the need for authorities to establish strict guidelines to ensure the accuracy of medical information provided by AI chatbots.
科研通智能强力驱动
Strongly Powered by AbleSci AI