作者
Sheza Malik,Lewis J. Frey,Jason Gutman,Asim Mushtaq,Fatima Warraich,Kamran Qureshi
摘要
Introduction: Recent advancements in Artificial Intelligence (AI), particularly through the deployment of Large Language Models (LLMs), have profoundly impacted healthcare. This study assesses five LLMs—ChatGPT 3.5, ChatGPT 4, BARD, CLAUDE, and COPILOT—on their response accuracy, clarity, and relevance to queries concerning acute liver failure (ALF). We subsequently compare these results with Chat GPT4 enhanced with Retrieval Augmented Generation (RAG) technology. Methods: Based on real-world clinical use and the American College of Gastroenterology guidelines, we formulated 16 ALF questions or clinical scenarios to explore LLMs' ability to handle different clinical questions. Using the "New Chat" functionality, each query was processed individually across the models to reduce any bias. Additionally, we employed the RAG functionality of GPT-4, which integrates external sources as references to ground the results. All responses were evaluated on a Likert scale from 1 to 5 for accuracy, clarity, and relevance by four independent investigators to ensure impartiality. Result: ChatGPT 4, augmented with RAG, demonstrated superior performance compared to others, consistently scoring the highest (4.70, 4.89, 4.78) across all three domains. ChatGPT 4 exhibited notable proficiency, with scores of 3.67 in accuracy, 4.04 in clarity, and 4.01 in relevance. In contrast, CLAUDE achieved 3.04 in clarity, 3.6 in relevance, and 3.65 in accuracy. Meanwhile, BARD and COPILOT exhibited lower performance levels; BARD recorded scores of 2.01 in accuracy and 3.03 in relevance, while COPILOT obtained 2.26 in accuracy and 3.12 in relevance. Conclusion: The study highlights Chat GPT 4 +RAG's superior performance compared to other LLMs. By integrating RAG with LLMs, the system combines generative language skills with accurate, up-to-date information. This improves response clarity, relevance, and accuracy, making them more effective in healthcare. However, AI models must continually evolve and align with medical practices for successful healthcare integration.