A Genovese,Srinivasagam Prabha,Sahar Borna,Cesar A. Gomez-Cabello,Syed Ali Haider,Maissa Trabilsy,Cui Tao,Antonio J. Forte
标识
DOI:10.20944/preprints202412.0297.v1
摘要
(1) Background: Artificial Intelligence (AI) can enhance patient education, but pre-trained models like ChatGPT provide inaccuracies. This study assessed a potential solution, Retrieval-Augmented Generation (RAG), for answering postoperative rhinoplasty inquiries; (2) Methods: Gemi-ni-1.0-Pro-002, Gemini-1.5-Flash-001, Gemini-1.5-Pro-001, and PaLM 2 were developed and posed 30 questions, using RAG to retrieve from plastic surgery textbooks. Responses were evaluated for accuracy (1-5 scale), comprehensiveness (1-3 scale), readability (FRE, FKGL), and understandabil-ity/actionability (PEMAT). Analysis included Wilcoxon rank sum, Armitage trend tests, and pair-wise comparisons; (3) Results: AI models performed well on straightforward questions but struggled with complexities (connecting "getting the face wet" with showering), leading to a 30.8% nonre-sponse rate. 41.7% of responses were completely accurate. Gemini-1.0-Pro-002 was more com-prehensive (p < 0.001) while PaLM 2 was less actionable (p < 0.007). Readability was poor (mean FRE: 40-49). Understandability averaged 0.7. No significant differences were found in accuracy, readability, or understandability among models; (4) Conclusions: RAG-based AI models show promise but are not yet suitable as standalone tools due to nonresponses and limitations in reada-bility and handling nuanced questions. Future efforts should focus on improvements in contextual understanding. With optimization, RAG-based AI could reduce surgeons' workload and enhance patient satisfaction, but it is currently unsafe for independent clinical use.