外行人
可读性
清晰
医学
索引(排版)
一致性
等级间信度
医学物理学
统计
计算机科学
评定量表
内科学
万维网
生物化学
化学
数学
政治学
法学
程序设计语言
作者
Michael Eppler,Conner Ganjavi,John Knudsen,Ryan J. Davis,Oluwatobiloba Ayo‐Ajibola,Aditya Desai,Lorenzo Storino Ramacciotti,Andrew Chen,Andre De Castro Abreu,Mihir Desai,Inderbir S. Gill,Giovanni Cacciamani
出处
期刊:Urology Practice
[Ovid Technologies (Wolters Kluwer)]
日期:2023-07-06
卷期号:10 (5): 436-443
被引量:22
标识
DOI:10.1097/upj.0000000000000428
摘要
This study assessed ChatGPT's ability to generate readable, accurate, and clear layperson summaries of urological studies, and compared the performance of ChatGPT-generated summaries with original abstracts and author-written patient summaries to determine its effectiveness as a potential solution for creating accessible medical literature for the public.Articles from the top 5 ranked urology journals were selected. A ChatGPT prompt was developed following guidelines to maximize readability, accuracy, and clarity, minimizing variability. Readability scores and grade-level indicators were calculated for the ChatGPT summaries, original abstracts, and patient summaries. Two MD physicians independently rated the accuracy and clarity of the ChatGPT-generated layperson summaries. Statistical analyses were conducted to compare readability scores. Cohen's κ coefficient was used to assess interrater reliability for correctness and clarity evaluations.A total of 256 journal articles were included. The ChatGPT-generated summaries were created with an average time of 17.5 (SD 15.0) seconds. The readability scores of the ChatGPT-generated summaries were significantly better than the original abstracts, with Global Readability Score 54.8 (12.3) vs 29.8 (18.5), Flesch Kincade Reading Ease 54.8 (12.3) vs 29.8 (18.5), Flesch Kincaid Grade Level 10.4 (2.2) vs 13.5 (4.0), Gunning Fog Score 12.9 (2.6) vs 16.6 (4.1), Smog Index 9.1 (2.0) vs 12.0 (3.0), Coleman Liau Index 12.9 (2.1) vs 14.9 (3.7), and Automated Readability Index 11.1 (2.5) vs 12.0 (5.7; P < .0001 for all except Automated Readability Index, which was P = .037). The correctness rate of ChatGPT outputs was >85% across all categories assessed, with interrater agreement (Cohen's κ) between 2 independent physician reviewers ranging from 0.76-0.95.ChatGPT can create accurate summaries of scientific abstracts for patients, with well-crafted prompts enhancing user-friendliness. Although the summaries are satisfactory, expert verification is necessary for improved accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI