Evaluating ChatGPT on Orbital and Oculofacial Disorders: Accuracy and Readability Insights

可读性组内相关医学人工智能可靠性（半导体）利克特量表统计自然语言处理医学物理学计算机科学心理测量学临床心理学量子力学物理功率（物理）程序设计语言数学

作者

Michael Balas,Ana Janic,Patrick Daigle,Navdeep Nijhawan,Ahsen Hussain,Harmeet S. Gill,Gabriela L. Lahaie,Michel J. Belliveau,Sean A. Crawford,Parnian Arjmand,Edsel Ing

出处

期刊：Ophthalmic Plastic and Reconstructive Surgery [Lippincott Williams & Wilkins]
日期：2023-11-16 被引量：2

链接

nih.govdoi.org

标识

DOI：10.1097/iop.0000000000002552

摘要

Purpose: To assess the accuracy and readability of responses generated by the artificial intelligence model, ChatGPT (version 4.0), to questions related to 10 essential domains of orbital and oculofacial disease. Methods: A set of 100 questions related to the diagnosis, treatment, and interpretation of orbital and oculofacial diseases was posed to ChatGPT 4.0. Responses were evaluated by a panel of 7 experts based on appropriateness and accuracy, with performance scores measured on a 7-item Likert scale. Inter-rater reliability was determined via the intraclass correlation coefficient. Results: The artificial intelligence model demonstrated accurate and consistent performance across all 10 domains of orbital and oculofacial disease, with an average appropriateness score of 5.3/6.0 (“mostly appropriate” to “completely appropriate”). Domains of cavernous sinus fistula, retrobulbar hemorrhage, and blepharospasm had the highest domain scores (average scores of 5.5 to 5.6), while the proptosis domain had the lowest (average score of 5.0/6.0). The intraclass correlation coefficient was 0.64 (95% CI: 0.52 to 0.74), reflecting moderate inter-rater reliability. The responses exhibited a high reading-level complexity, representing the comprehension levels of a college or graduate education. Conclusions: This study demonstrates the potential of ChatGPT 4.0 to provide accurate information in the field of ophthalmology, specifically orbital and oculofacial disease. However, challenges remain in ensuring accurate and comprehensive responses across all disease domains. Future improvements should focus on refining the model’s correctness and eventually expanding the scope to visual data interpretation. Our results highlight the vast potential for artificial intelligence in educational and clinical ophthalmology contexts.

求助该文献

最长约 10秒，即可获得该文献文件

Evaluating ChatGPT on Orbital and Oculofacial Disorders: Accuracy and Readability Insights

今日热心研友