生成语法
变压器
医学
计算机科学
人工智能
工程类
电气工程
电压
作者
Shaoting Luo,Federico Canavese,Alaric Aroojis,Antonio Andreacchio,Darko Antičević,Maryse Bouchard,Pablo Castañeda,Vincenzo De Rosa,Michel Armand Fiogbe,Steven L. Frick,James Hoi Po Hui,Ashok Johari,Antonio Loro,Xuemin Lyu,Masaki Matsushita,Hakan Ömeroğlu,David P. Roye,Maulin Shah,Bicheng Yong,Lianyong Li
出处
期刊:Journal of Pediatric Orthopaedics
[Ovid Technologies (Wolters Kluwer)]
日期:2024-04-09
卷期号:44 (6): e504-e511
被引量:1
标识
DOI:10.1097/bpo.0000000000002682
摘要
Objective: There is increasing interest in applying artificial intelligence chatbots like generative pretrained transformer 4 (GPT-4) in the medical field. This study aimed to explore the universality of GPT-4 responses to simulated clinical scenarios of developmental dysplasia of the hip (DDH) across diverse global settings. Methods: Seventeen international experts with more than 15 years of experience in pediatric orthopaedics were selected for the evaluation panel. Eight simulated DDH clinical scenarios were created, covering 4 key areas: (1) initial evaluation and diagnosis, (2) initial examination and treatment, (3) nursing care and follow-up, and (4) prognosis and rehabilitation planning. Each scenario was completed independently in a new GPT-4 session. Interrater reliability was assessed using Fleiss kappa, and the quality, relevance, and applicability of GPT-4 responses were analyzed using median scores and interquartile ranges. Following scoring, experts met in ZOOM sessions to generate Regional Consensus Assessment Scores, which were intended to represent a consistent regional assessment of the use of the GPT-4 in pediatric orthopaedic care. Results: GPT-4’s responses to the 8 clinical DDH scenarios received performance scores ranging from 44.3% to 98.9% of the 88-point maximum. The Fleiss kappa statistic of 0.113 ( P = 0.001) indicated low agreement among experts in their ratings. When assessing the responses’ quality, relevance, and applicability, the median scores were 3, with interquartile ranges of 3 to 4, 3 to 4, and 2 to 3, respectively. Significant differences were noted in the prognosis and rehabilitation domain scores ( P < 0.05 for all). Regional consensus scores were 75 for Africa, 74 for Asia, 73 for India, 80 for Europe, and 65 for North America, with the Kruskal-Wallis test highlighting significant disparities between these regions ( P = 0.034). Conclusions: This study demonstrates the promise of GPT-4 in pediatric orthopaedic care, particularly in supporting preliminary DDH assessments and guiding treatment strategies for specialist care. However, effective integration of GPT-4 into clinical practice will require adaptation to specific regional health care contexts, highlighting the importance of a nuanced approach to health technology adaptation. Level of Evidence: Level IV.