透明度(行为)
德国的
软组织肉瘤
计算机科学
软组织
肉瘤
材料科学
医学
地理
放射科
病理
计算机安全
考古
作者
Chengpeng Li,Weiwei Jia,Chu Yanjun,Franka Menge,Tobias Speer,Christoph Reißfelder,Peter Hohenberger,Jens Jakob,Cui Yang
摘要
Introduction: This study aimed to evaluate the effectiveness of GPT-4o, with and without Retrieval-Augmented Generation (RAG), in responding to soft tissue sarcoma (STS)-related queries. Methods: The study used a 20-question dataset derived from clinical scenarios related to adult STS. The responses were generated by GPT-4o with and without the RAG approach. The RAG system incorporated the English version of German evidence-based S3 guidelines through an embedding-based retrieval system. Two sarcoma experts evaluated the responses for accuracy, comprehensiveness, and safety using a Likert scale. Statistical analyses were conducted to compare the performances. Results: GPT-4o with RAG outperformed the model without RAG across all evaluated areas (p<0.05). GPT-4o without RAG had a 40% error rate, which was reduced to 10% by the RAG approach. In 90% of the questions, the pages with the relevant information that addressed the questions were correctly cited using the retrieval system. Conclusion: The RAG approach significantly enhanced the performance of GPT-4o in answering STS-related questions. However, the model still produced incorrect responses in certain complex scenarios. GPT-4o, even with RAG, should be used cautiously in clinical settings, particularly for rare diseases like sarcoma. Human expertise remains irreplaceable in medical decision-making.
科研通智能强力驱动
Strongly Powered by AbleSci AI