A Comparative Study of Large Language Models, Human Experts, and Expert-Edited Large Language Models to Neuro-Ophthalmology Questions

移情 心理学 质量(理念) 医学 精神科 哲学 认识论
作者
Prashant D. Tailor,Lauren A. Dalvin,Matthew R. Starr,Deena Tajfirouz,Kevin D. Chodnicki,Michael C. Brodsky,Sasha A. Mansukhani,Heather E. Moss,Kevin E. Lai,Melissa W. Ko,Devin D. Mackay,Marie A. Di Nome,Oana M. Dumitrascu,Misha Pless,Eric Eggenberger,John J. Chen
出处
期刊:Journal of Neuro-ophthalmology [Lippincott Williams & Wilkins]
被引量:5
标识
DOI:10.1097/wno.0000000000002145
摘要

Background: While large language models (LLMs) are increasingly used in medicine, their effectiveness compared with human experts remains unclear. This study evaluates the quality and empathy of Expert + AI, human experts, and LLM responses in neuro-ophthalmology. Methods: This randomized, masked, multicenter cross-sectional study was conducted from June to July 2023. We randomly assigned 21 neuro-ophthalmology questions to 13 experts. Each expert provided an answer and then edited a ChatGPT-4–generated response, timing both tasks. In addition, 5 LLMs (ChatGPT-3.5, ChatGPT-4, Claude 2, Bing, Bard) generated responses. Anonymized and randomized responses from Expert + AI, human experts, and LLMs were evaluated by the remaining 12 experts. The main outcome was the mean score for quality and empathy, rated on a 1–5 scale. Results: Significant differences existed between response types for both quality and empathy ( P < 0.0001, P < 0.0001). For quality, Expert + AI (4.16 ± 0.81) performed the best, followed by GPT-4 (4.04 ± 0.92), GPT-3.5 (3.99 ± 0.87), Claude (3.6 ± 1.09), Expert (3.56 ± 1.01), Bard (3.5 ± 1.15), and Bing (3.04 ± 1.12). For empathy, Expert + AI (3.63 ± 0.87) had the highest score, followed by GPT-4 (3.6 ± 0.88), Bard (3.54 ± 0.89), GPT-3.5 (3.5 ± 0.83), Bing (3.27 ± 1.03), Expert (3.26 ± 1.08), and Claude (3.11 ± 0.78). For quality ( P < 0.0001) and empathy ( P = 0.002), Expert + AI performed better than Expert. Time taken for expert-created and expert-edited LLM responses was similar ( P = 0.75). Conclusions: Expert-edited LLM responses had the highest expert-determined ratings of quality and empathy warranting further exploration of their potential benefits in clinical settings.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
牛马发布了新的文献求助10
1秒前
高灵雨发布了新的文献求助10
1秒前
酷酷的冰真应助75986686采纳,获得20
1秒前
Dr_Shi完成签到,获得积分10
2秒前
小小完成签到,获得积分10
2秒前
小龙完成签到,获得积分10
3秒前
怡然幻然发布了新的文献求助10
3秒前
3秒前
路易斯完成签到,获得积分10
3秒前
Treasure完成签到,获得积分10
3秒前
syx完成签到,获得积分10
4秒前
5秒前
冷静的小土豆完成签到,获得积分10
5秒前
研路漫漫完成签到,获得积分10
6秒前
KimJongUn完成签到,获得积分10
6秒前
ai白哥完成签到,获得积分10
6秒前
天天发布了新的文献求助10
7秒前
haoyunlai完成签到,获得积分10
8秒前
欣慰外绣完成签到,获得积分10
8秒前
万能图书馆应助joy采纳,获得10
8秒前
sjxx完成签到,获得积分10
9秒前
llll完成签到,获得积分10
9秒前
Lillianzhu1完成签到,获得积分10
10秒前
怡然幻然完成签到,获得积分10
11秒前
11111112222完成签到,获得积分10
11秒前
knn完成签到,获得积分10
11秒前
李伟完成签到,获得积分10
11秒前
skinnylove完成签到,获得积分10
11秒前
我是老大应助AGuang采纳,获得10
11秒前
李健的粉丝团团长应助JIE采纳,获得10
12秒前
安静代萱完成签到 ,获得积分10
12秒前
无限的晓凡完成签到,获得积分10
12秒前
俏皮火完成签到 ,获得积分10
13秒前
JinGN完成签到,获得积分10
13秒前
仙都丽娜完成签到,获得积分10
15秒前
风趣的天真完成签到,获得积分10
15秒前
15秒前
estrale完成签到 ,获得积分10
16秒前
默默无闻完成签到,获得积分10
16秒前
zzz完成签到,获得积分10
18秒前
高分求助中
A new approach to the extrapolation of accelerated life test data 1000
Cognitive Neuroscience: The Biology of the Mind 1000
Technical Brochure TB 814: LPIT applications in HV gas insulated switchgear 1000
Immigrant Incorporation in East Asian Democracies 600
Nucleophilic substitution in azasydnone-modified dinitroanisoles 500
不知道标题是什么 500
A Preliminary Study on Correlation Between Independent Components of Facial Thermal Images and Subjective Assessment of Chronic Stress 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 3968603
求助须知:如何正确求助?哪些是违规求助? 3513420
关于积分的说明 11168029
捐赠科研通 3248900
什么是DOI,文献DOI怎么找? 1794540
邀请新用户注册赠送积分活动 875187
科研通“疑难数据库(出版商)”最低求助积分说明 804676