医学
急诊分诊台
召回
诊断准确性
心理干预
工作流程
人工智能
透明度(行为)
精密医学
精确性和召回率
医疗急救
放射科
病理
认知心理学
计算机科学
精神科
计算机安全
心理学
数据库
作者
Gianluca Marcaccini,Ishith Seth,Yi Xie,Pietro Susini,Mirco Pozzi,Roberto Cuomo,Warren M. Rozen
摘要
Background: Hand fracture management requires precise diagnostic accuracy and complex decision-making. Advances in artificial intelligence (AI) suggest that large language models (LLMs) may assist or even rival traditional clinical approaches. This study evaluates the effectiveness of ChatGPT-4o, DeepSeek-V3, and Gemini 1.5 in diagnosing and recommending treatment strategies for hand fractures compared to experienced surgeons. Methods: A retrospective analysis of 58 anonymized hand fracture cases was conducted. Clinical details, including fracture site, displacement, and soft-tissue involvement, were provided to the AI models, which generated management plans. Their recommendations were compared to actual surgeon decisions, assessing accuracy, precision, recall, and F1 score. Results: ChatGPT-4o demonstrated the highest accuracy (98.28%) and recall (91.74%), effectively identifying most correct interventions but occasionally proposing extraneous options (precision 58.48%). DeepSeek-V3 showed moderate accuracy (63.79%), with balanced precision (61.17%) and recall (57.89%), sometimes omitting correct treatments. Gemini 1.5 performed poorly (accuracy 18.97%), with low precision and recall, indicating substantial limitations in clinical decision support. Conclusions: AI models can enhance clinical workflows, particularly in radiographic interpretation and triage, but their limitations highlight the irreplaceable role of human expertise in complex hand trauma management. ChatGPT-4o demonstrated promising accuracy but requires refinement. Ethical concerns regarding AI-driven medical decisions, including bias and transparency, must be addressed before widespread clinical implementation.
科研通智能强力驱动
Strongly Powered by AbleSci AI