Dyke Ferber,Isabella C. Wiest,Georg Wölflein,Matthias P. Ebert,Gernot Beutel,Jan-Niklas Eckardt,Daniel Truhn,Christoph Springfeld,Dirk Jäger,Jakob Nikolas Kather
标识
DOI:10.1056/aics2300235
摘要
Oncologists face increasingly complex clinical decision-making processes as new cancer therapies are approved and treatment guidelines are revised at an unprecedented rate. With the aim of improving oncologists' efficiency and supporting their adherence to the most recent treatment recommendations, we evaluated the use of the large language model generative pretrained transformer 4 (GPT-4) to interpret guidelines from the American Society of Clinical Oncology and the European Society for Medical Oncology. The ability of GPT-4 to answer clinically relevant questions regarding the management of patients with pancreatic cancer, metastatic colorectal cancer, and hepatocellular carcinoma was assessed. We also assessed GPT-4 outputs with and without retrieval-augmented generation (RAG), which provided additional knowledge to the model, and then manually compared the results with the original guideline documents. GPT-4 with RAG provided correct responses in 84% of cases (of 218 statements, 184 were correct, 30 were inaccurate, and 4 were wrong). GPT-4 without RAG provided correct responses in only 57% of cases (of 163 statements, 93 were correct, 29 were inaccurate, and 41 were wrong). We showed that GPT-4, when enhanced with additional clinical information through RAG, can accurately identify detailed similarities and disparities in diagnostic and treatment proposals across different authoritative sources.