Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures

有用性医学可读性聊天机器人医学物理学常见问题人工智能心理学医学教育理解力社会心理学计算机科学程序设计语言

作者

Avi A. Gajjar,Rohit Prem Kumar,Ethan Paliwoda,Cathleen C. Kuo,Samuel Adida,Andrew D. Legarreta,Hansen Deng,Sharath Kumar Anand,D. Kojo Hamilton,Thomas J. Buell,Nitin Agarwal,Peter C. Gerszten,Joseph S. Hudson

出处

期刊：Neurosurgery [Oxford University Press]
日期：2024-02-14 被引量：13

链接

nih.govdoi.org

标识

DOI：10.1227/neu.0000000000002856

摘要

BACKGROUND AND OBJECTIVES: The Internet has become a primary source of health information, leading patients to seek answers online before consulting health care providers. This study aims to evaluate the implementation of Chat Generative Pre-Trained Transformer (ChatGPT) in neurosurgery by assessing the accuracy and helpfulness of artificial intelligence (AI)–generated responses to common postsurgical questions. METHODS: A list of 60 commonly asked questions regarding neurosurgical procedures was developed. ChatGPT-3.0, ChatGPT-3.5, and ChatGPT-4.0 responses to these questions were recorded and graded by numerous practitioners for accuracy and helpfulness. The understandability and actionability of the answers were assessed using the Patient Education Materials Assessment Tool. Readability analysis was conducted using established scales. RESULTS: A total of 1080 responses were evaluated, equally divided among ChatGPT-3.0, 3.5, and 4.0, each contributing 360 responses. The mean helpfulness score across the 3 subsections was 3.511 ± 0.647 while the accuracy score was 4.165 ± 0.567. The Patient Education Materials Assessment Tool analysis revealed that the AI-generated responses had higher actionability scores than understandability. This indicates that the answers provided practical guidance and recommendations that patients could apply effectively. On the other hand, the mean Flesch Reading Ease score was 33.5, suggesting that the readability level of the responses was relatively complex. The Raygor Readability Estimate scores ranged within the graduate level, with an average score of the 15th grade. CONCLUSION: The artificial intelligence chatbot's responses, although factually accurate, were not rated highly beneficial, with only marginal differences in perceived helpfulness and accuracy between ChatGPT-3.0 and ChatGPT-3.5 versions. Despite this, the responses from ChatGPT-4.0 showed a notable improvement in understandability, indicating enhanced readability over earlier versions.

求助该文献

最长约 10秒，即可获得该文献文件

Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures

今日热心研友