有用性
医学
可读性
聊天机器人
医学物理学
常见问题
人工智能
心理学
医学教育
理解力
社会心理学
计算机科学
程序设计语言
作者
Avi A. Gajjar,Rohit Prem Kumar,Ethan Paliwoda,Cathleen C. Kuo,Samuel Adida,Andrew D. Legarreta,Hansen Deng,Sharath Kumar Anand,D. Kojo Hamilton,Thomas J. Buell,Nitin Agarwal,Peter C. Gerszten,Joseph S. Hudson
出处
期刊:Neurosurgery
[Oxford University Press]
日期:2024-02-14
被引量:13
标识
DOI:10.1227/neu.0000000000002856
摘要
BACKGROUND AND OBJECTIVES: The Internet has become a primary source of health information, leading patients to seek answers online before consulting health care providers. This study aims to evaluate the implementation of Chat Generative Pre-Trained Transformer (ChatGPT) in neurosurgery by assessing the accuracy and helpfulness of artificial intelligence (AI)–generated responses to common postsurgical questions. METHODS: A list of 60 commonly asked questions regarding neurosurgical procedures was developed. ChatGPT-3.0, ChatGPT-3.5, and ChatGPT-4.0 responses to these questions were recorded and graded by numerous practitioners for accuracy and helpfulness. The understandability and actionability of the answers were assessed using the Patient Education Materials Assessment Tool. Readability analysis was conducted using established scales. RESULTS: A total of 1080 responses were evaluated, equally divided among ChatGPT-3.0, 3.5, and 4.0, each contributing 360 responses. The mean helpfulness score across the 3 subsections was 3.511 ± 0.647 while the accuracy score was 4.165 ± 0.567. The Patient Education Materials Assessment Tool analysis revealed that the AI-generated responses had higher actionability scores than understandability. This indicates that the answers provided practical guidance and recommendations that patients could apply effectively. On the other hand, the mean Flesch Reading Ease score was 33.5, suggesting that the readability level of the responses was relatively complex. The Raygor Readability Estimate scores ranged within the graduate level, with an average score of the 15th grade. CONCLUSION: The artificial intelligence chatbot's responses, although factually accurate, were not rated highly beneficial, with only marginal differences in perceived helpfulness and accuracy between ChatGPT-3.0 and ChatGPT-3.5 versions. Despite this, the responses from ChatGPT-4.0 showed a notable improvement in understandability, indicating enhanced readability over earlier versions.
科研通智能强力驱动
Strongly Powered by AbleSci AI