Evaluation of GPT Large Language Model Performance on RSNA 2023 Case of the Day Questions
医学
医学物理学
核医学
作者
Pritam Mukherjee,Benjamin Hou,Abhinav Suri,Yan Zhuang,Christopher Parnell,N. Lee,Oana M Stroie,Ravi Jain,Kenneth C. Wang,Komal Sharma,Ronald M. Summers
出处
期刊:Radiology [Radiological Society of North America] 日期:2024-10-01卷期号:313 (1)被引量:4
Background GPT-4V (GPT-4 with vision, ChatGPT; OpenAI) has shown impressive performance in several medical assessments. However, few studies have assessed its performance in interpreting radiologic images. Purpose To assess and compare the accuracy of GPT-4V in assessing radiologic cases with both images and textual context to that of radiologists and residents, to assess if GPT-4V assistance improves human accuracy, and to assess and compare the accuracy of GPT-4V with that of image-only or text-only inputs. Materials and Methods Seventy-two Case of the Day questions at the RSNA 2023 Annual Meeting were curated in this observer study. Answers from GPT-4V were obtained between November 26 and December 10, 2023, with the following inputs for each question: image only, text only, and both text and images. Five radiologists and three residents also answered the questions in an "open book" setting. For the artificial intelligence (AI)-assisted portion, the radiologists and residents were provided with the outputs of GPT-4V. The accuracy of radiologists and residents, both with and without AI assistance, was analyzed using a mixed-effects linear model. The accuracies of GPT-4V with different input combinations were compared by using the McNemar test.