亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Appropriateness of Answers to Common Preanesthesia Patient Questions Composed by the Large Language Model GPT-4 Compared to Human Authors

医学 重症监护医学 医疗急救
作者
Scott Segal,Amit K. Saha,Ashish K. Khanna
出处
期刊:Anesthesiology [Ovid Technologies (Wolters Kluwer)]
卷期号:140 (2): 333-335
标识
DOI:10.1097/aln.0000000000004824
摘要

Many surgical patients will not interact with anesthesiologists until minutes before surgery, and the internet has become a common source of medical information. The use of large language models such as GPT-4, which are "generative artificial intelligence" tools capable of creating natural, human-sounding prose in response to a plain-language query, and their incorporation into search engines, promise to make it easier for patients to directly ask questions related to preanesthetic preparation. The accuracy of large language models in answering medical questions has generally been impressive1–3 but has not been evaluated for preanesthetic queries. We evaluated the ability of the widely accessible model GPT-4 to provide reasonable responses to common preanesthetic patient questions, compared to online published resources. Our hypothesis was that GPT-4 was at least as reasonable as published resources.The study was approved by the Wake Forest University School of Medicine institutional review board and completion of the survey was deemed to indicate informed consent. Sixteen common preanesthetic questions were drawn from websites of academic anesthesiology departments (table 1; Supplemental Content 1, https://links.lww.com/ALN/D360). The online answers and the answers to the same questions provided by GPT-4 via the ChatGPT Pro interface (chat.openai.com), queried on two dates in independent sessions in April 2023, were obtained. Two sessions with ChatGPT were used because the software regenerates new responses when reprompted, and the responses may differ in quality.1 Survey participants were preoperative anesthesia experts known to the investigators, and other similar experts suggested by this cohort, nearly all of whom were academicians involved with preoperative assessment (total solicited N = 210). The survey instructions asked raters to "evaluate answers to questions about anesthesia care that patients may ask. Your task is simply to evaluate each statement as 'reasonable' or 'unreasonable.' Please select 'reasonable' unless you detect a significant error or major omission." For each question, the survey recipient was randomly presented with a single answer without knowledge of its authorship in an approximately 2:1 ratio of GPT-4–generated responses to website content. Participants then rated each answer as "reasonable" or "unreasonable."3 Respondents were anonymous unless they chose to give their name for acknowledgment (Supplemental Content 2, https://links.lww.com/ALN/D361). Enrollment was closed when less than 1 response per day was observed. The total percent rated reasonable were compared for GPT-4 and website content overall and for each question with Pearson chi-square or Fisher exact test. We estimated 240 responses per group (i.e., GPT-4 or human authored) would be needed to detect a 10% difference in ratings, assuming 90% "reasonable" in the published statements, with 80% power and α = 0.05.Seventy-four of 210 (35%) invited participants responded during the 10-day survey period. The combined results and those for each question are shown in table 2. Overall, GPT-4 answers were more frequently rated reasonable compared to published websites: 536/644 (83.2%) versus 328/435 (75.4%), P = 0.002. GPT-4's responses to four individual questions ("Why can't I eat before surgery," "What can I drink before surgery," "Why does the anesthesiologist want to know about my teeth," and "Can I have surgery without opioids?") were significantly more highly rated compared to human-authored responses (table 2). No human-authored answers were rated more reasonable than the corresponding GPT-4 answer.The findings of this study suggest that experts in preoperative anesthesia care rate responses to common preoperative questions similarly when generated by GPT-4 compared to those provided on academic websites. Our results are similar to other reports of GPT-4 answers to potential patient questions in preventive cardiology,3 general medicine,4 and diabetes,5 although some poor-quality answers have been observed in other fields.6–8 Given the rapid growth in public access to generative artificial intelligence platforms, our finding of comparable human- and machine-authored answer quality is reassuring.A strength of our study compared to previous reports is that we used a large number of human raters, blinded to the provenance of any given comment, rather than a small panel of experts, to evaluate the generative artificial intelligence responses. We believe this adds some generalizability to our findings. Conversely, our design has some limitations. We used a relatively coarse rating scale for the answers (reasonable or unreasonable) rather than Likert scales, although such an approach has been used by other investigators.3 A further refinement could be asking reviewers to evaluate the statements in specific domains, including overall accuracy, the presence of fabrications, and concordance, or internal consistency. The human-authored texts were all present on patient-facing websites from expectedly reputable academic institutions but may not be representative of the overall quality of such sources (and poorer performing online answers represent an opportunity for improvement). The anonymous nature of the survey also makes it impossible to assess nonresponder characteristics. Models such as GPT-4 are trained on millions of documents, which might include preoperative websites, and do not provide determinative outputs, and generate text using an iterative process of choosing the next word in a sentence. While generating fluent, human-like prose, large language models are well-known to also occasionally generate inaccurate results, known as hallucinations.9 Although we did not observe any such statements in this investigation, this remains a risk if widely used by preoperative patients. Conversely, models such as GPT-4 and its successors could be used, in partnership with human experts, to curate the responses, to generate patient-facing text on a variety of topics, including those posed and evaluated by patients themselves.Overall, the findings of this study suggest that generative artificial intelligence used in large language models such as GPT-4 may be an effective source of medical information for patients about preparing for anesthesia. Although caution is still in order until accuracy can be assured, models such as GPT-4 should be further studied for potential involvement in patient-facing activities in anesthesia care.Support was provided solely from institutional and/or departmental sources.The authors declare no competing interests.1. Questions and answers evaluated in the study, https://links.lww.com/ALN/D3602. Respondents wishing to be acknowledged, https://links.lww.com/ALN/D361
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
14秒前
ZIS发布了新的文献求助10
16秒前
letmeknow发布了新的文献求助10
19秒前
Orange应助千柳采纳,获得10
47秒前
lingshan完成签到 ,获得积分10
1分钟前
李爱国应助科研通管家采纳,获得10
1分钟前
1分钟前
2分钟前
2分钟前
千柳发布了新的文献求助10
2分钟前
Jasper应助千柳采纳,获得10
2分钟前
千柳完成签到,获得积分10
2分钟前
顺心的易槐完成签到 ,获得积分10
3分钟前
4分钟前
淡淡醉波wuliao完成签到 ,获得积分10
4分钟前
Melody发布了新的文献求助10
4分钟前
科研通AI2S应助无限的含蕾采纳,获得30
4分钟前
Orange应助紧张的海露采纳,获得10
4分钟前
叔铭完成签到,获得积分10
5分钟前
皮盼旋完成签到 ,获得积分10
5分钟前
加湿器应助科研通管家采纳,获得30
5分钟前
加湿器应助科研通管家采纳,获得30
5分钟前
eve应助科研通管家采纳,获得30
5分钟前
铲铲完成签到,获得积分10
5分钟前
无情的匪完成签到 ,获得积分10
5分钟前
albertchan完成签到,获得积分10
5分钟前
5分钟前
5分钟前
lzd完成签到,获得积分10
5分钟前
不会失忆完成签到,获得积分10
5分钟前
不会学术的羊完成签到,获得积分10
6分钟前
科研路上的干饭桶完成签到,获得积分10
6分钟前
机智的紫丝完成签到,获得积分10
6分钟前
完美世界应助紧张的海露采纳,获得10
6分钟前
loulan完成签到,获得积分10
6分钟前
丘比特应助满意人英采纳,获得10
6分钟前
白日焰火完成签到 ,获得积分10
6分钟前
7分钟前
7分钟前
赧赧完成签到 ,获得积分10
7分钟前
高分求助中
Tracking and Data Fusion: A Handbook of Algorithms 1000
Models of Teaching(The 10th Edition,第10版!)《教学模式》(第10版!) 800
La décision juridictionnelle 800
Rechtsphilosophie und Rechtstheorie 800
Nonlocal Integral Equation Continuum Models: Nonstandard Symmetric Interaction Neighborhoods and Finite Element Discretizations 600
Academic entitlement: Adapting the equity preference questionnaire for a university setting 500
Arkiv för kemi 400
热门求助领域 (近24小时)
化学 医学 材料科学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 免疫学 细胞生物学 电极
热门帖子
关注 科研通微信公众号,转发送积分 2876854
求助须知:如何正确求助?哪些是违规求助? 2488986
关于积分的说明 6737765
捐赠科研通 2171271
什么是DOI,文献DOI怎么找? 1153656
版权声明 585924
科研通“疑难数据库(出版商)”最低求助积分说明 566445