医学
主题分析
一致性(知识库)
脊柱推拿
脊柱外科
保守管理
问答
医学教育
外科
替代医学
定性研究
病理
脊椎按摩疗法
人工智能
社会科学
社会学
计算机科学
情报检索
作者
Mehmet Şahap,Michael McCarthy,M N Elmarawany,Stuart H. James,MP Grevitt,Rohan Jayasuriya,Andrew M. Jones,Andrew Bowey,D. Chan,Edward Bayley,Ian Harding,James Tomlinson,John P. Andrews,Shreya Srinivas
标识
DOI:10.1093/bjs/znae163.693
摘要
Abstract Aim Can Large Language Models (LLMs) provide the answers to common controversial spinal surgery scenarios and answer the dilemmas that we can’t? Method 54 highly detailed questions were developed on 18 scenarios, for example, ‘Management of Painless Foot Drop’. 9 Consultant Spinal Surgeons answered the questions on 2 separate occasions. The questions were submitted to 4 LLMs and the answers regenerated 5 times for each. Response reproducibility and consistency was compared, and a thematic analysis of the AI answers was undertaken. Results Bing Chat was excluded from the study. ChatGPT3.5 refused to give a definitive answer in 14% of its answers, ChatGPT4 in 29% and Bard in 11%. ChatGPT3.5 suggested the user seek medical advice in 60% of its answers, ChatGPT4 99% and Bard 45%. Surgeons stated they were confident in their answers in 96%. AI answers were deemed decisive: ChatGPT3.5 71%, ChatGPT4 24% and Bard 92%. Reproducibility of Consultants answers averaged 63%, and 64% for AI overall. Agreement between the Consultants for each question averaged 66%, and 64% between AI. Thematic analysis of the AI answers revealed themes including surgical and conservative management, individualised approach, risks and benefits, consideration of severity and duration of symptoms, and decision-making processes. Conclusions ChatGPT and Bard provide detailed answers to common controversial spinal surgery scenarios, however, they are not as decisive as consultants, and agreed on fewer management plans. Many of these scenarios remained unanswered.
科研通智能强力驱动
Strongly Powered by AbleSci AI