Comparison of Medical Research Abstracts Written by Surgical Trainees and Senior Surgeons or Generated by Large Language Models

医学 医学教育 心理学
作者
Alexis M. Holland,W Lorenz,Jack C. Cavanagh,Neil Smart,Sullivan A. Ayuso,Gregory T. Scarola,Kent W. Kercher,Lars Nannestad Jørgensen,Jeffrey E. Janis,John P. Fischer,B. Todd Heniford
出处
期刊:JAMA network open [American Medical Association]
卷期号:7 (8): e2425373-e2425373 被引量:4
标识
DOI:10.1001/jamanetworkopen.2024.25373
摘要

Importance: Artificial intelligence (AI) has permeated academia, especially OpenAI Chat Generative Pretrained Transformer (ChatGPT), a large language model. However, little has been reported on its use in medical research. Objective: To assess a chatbot's capability to generate and grade medical research abstracts. Design, Setting, and Participants: In this cross-sectional study, ChatGPT versions 3.5 and 4.0 (referred to as chatbot 1 and chatbot 2) were coached to generate 10 abstracts by providing background literature, prompts, analyzed data for each topic, and 10 previously presented, unassociated abstracts to serve as models. The study was conducted between August 2023 and February 2024 (including data analysis). Exposure: Abstract versions utilizing the same topic and data were written by a surgical trainee or a senior physician or generated by chatbot 1 and chatbot 2 for comparison. The 10 training abstracts were written by 8 surgical residents or fellows, edited by the same senior surgeon, at a high-volume hospital in the Southeastern US with an emphasis on outcomes-based research. Abstract comparison was then based on 10 abstracts written by 5 surgical trainees within the first 6 months of their research year, edited by the same senior author. Main Outcomes and Measures: The primary outcome measurements were the abstract grades using 10- and 20-point scales and ranks (first to fourth). Abstract versions by chatbot 1, chatbot 2, junior residents, and the senior author were compared and judged by blinded surgeon-reviewers as well as both chatbot models. Five academic attending surgeons from Denmark, the UK, and the US, with extensive experience in surgical organizations, research, and abstract evaluation served as reviewers. Results: Surgeon-reviewers were unable to differentiate between abstract versions. Each reviewer ranked an AI-generated version first at least once. Abstracts demonstrated no difference in their median (IQR) 10-point scores (resident, 7.0 [6.0-8.0]; senior author, 7.0 [6.0-8.0]; chatbot 1, 7.0 [6.0-8.0]; chatbot 2, 7.0 [6.0-8.0]; P = .61), 20-point scores (resident, 14.0 [12.0-7.0]; senior author, 15.0 [13.0-17.0]; chatbot 1, 14.0 [12.0-16.0]; chatbot 2, 14.0 [13.0-16.0]; P = .50), or rank (resident, 3.0 [1.0-4.0]; senior author, 2.0 [1.0-4.0]; chatbot 1, 3.0 [2.0-4.0]; chatbot 2, 2.0 [1.0-3.0]; P = .14). The abstract grades given by chatbot 1 were comparable to the surgeon-reviewers' grades. However, chatbot 2 graded more favorably than the surgeon-reviewers and chatbot 1. Median (IQR) chatbot 2-reviewer grades were higher than surgeon-reviewer grades of all 4 abstract versions (resident, 14.0 [12.0-17.0] vs 16.9 [16.0-17.5]; P = .02; senior author, 15.0 [13.0-17.0] vs 17.0 [16.5-18.0]; P = .03; chatbot 1, 14.0 [12.0-16.0] vs 17.8 [17.5-18.5]; P = .002; chatbot 2, 14.0 [13.0-16.0] vs 16.8 [14.5-18.0]; P = .04). When comparing the grades of the 2 chatbots, chatbot 2 gave higher median (IQR) grades for abstracts than chatbot 1 (resident, 14.0 [13.0-15.0] vs 16.9 [16.0-17.5]; P = .003; senior author, 13.5 [13.0-15.5] vs 17.0 [16.5-18.0]; P = .004; chatbot 1, 14.5 [13.0-15.0] vs 17.8 [17.5-18.5]; P = .003; chatbot 2, 14.0 [13.0-15.0] vs 16.8 [14.5-18.0]; P = .01). Conclusions and Relevance: In this cross-sectional study, trained chatbots generated convincing medical abstracts, undifferentiable from resident or senior author drafts. Chatbot 1 graded abstracts similarly to surgeon-reviewers, while chatbot 2 was less stringent. These findings may assist surgeon-scientists in successfully implementing AI in medical research.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
1秒前
zh完成签到,获得积分10
1秒前
LYY完成签到 ,获得积分10
1秒前
baoleijia发布了新的文献求助200
1秒前
豆包完成签到,获得积分10
3秒前
rorraine_xu完成签到,获得积分10
4秒前
李某某应助123采纳,获得10
5秒前
SUE发布了新的文献求助10
6秒前
6秒前
迷路的问丝完成签到,获得积分20
6秒前
7秒前
Lucas应助一叶扁舟采纳,获得10
7秒前
8秒前
科目三应助cyanpomelo采纳,获得10
10秒前
10秒前
deeferf发布了新的文献求助10
11秒前
11秒前
小魔王发布了新的文献求助10
12秒前
14秒前
zhouyu发布了新的文献求助10
14秒前
14秒前
云是完成签到 ,获得积分10
15秒前
李y梅子完成签到 ,获得积分10
16秒前
ran发布了新的文献求助10
16秒前
ChatGPT发布了新的文献求助10
19秒前
成泰乐发布了新的文献求助10
20秒前
少少少完成签到,获得积分10
20秒前
李宫俊发布了新的文献求助10
20秒前
斑马不一般应助吱呜采纳,获得10
21秒前
21秒前
21秒前
王则华关注了科研通微信公众号
22秒前
ran完成签到,获得积分10
23秒前
Fengyun完成签到,获得积分10
23秒前
10完成签到,获得积分10
24秒前
25秒前
26秒前
sunny完成签到 ,获得积分10
27秒前
852应助圆圆采纳,获得10
27秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Zeolites: From Fundamentals to Emerging Applications 1500
Architectural Corrosion and Critical Infrastructure 1000
Early Devonian echinoderms from Victoria (Rhombifera, Blastoidea and Ophiocistioidea) 1000
Hidden Generalizations Phonological Opacity in Optimality Theory 1000
By R. Scott Kretchmar - Practical Philosophy of Sport and Physical Activity - 2nd (second) Edition: 2nd (second) Edition 666
Energy-Size Reduction Relationships In Comminution 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 4941061
求助须知:如何正确求助?哪些是违规求助? 4207141
关于积分的说明 13076618
捐赠科研通 3985902
什么是DOI,文献DOI怎么找? 2182363
邀请新用户注册赠送积分活动 1197920
关于科研通互助平台的介绍 1110256