危害
专业
正确性
医学
心理学
家庭医学
计算机科学
社会心理学
程序设计语言
作者
Zhuo Ran Cai,Michael L. Chen,Jiyeong Kim,Roberto A. Novoa,Leandra A. Barnes,Andrew L. Beam,Eleni Linos
标识
DOI:10.1016/j.jid.2024.01.015
摘要
Artificial intelligence (AI)-based large language models (LLMs) have been shown to have promising performance in medical applications, including on specialty board examination questions and complex clinical cases ( Beam et al., 2023 Beam K. Sharma P. Kumar B. Wang C. Brodsky D. Martin C.R. et al. Performance of a Large Language Model on Practice Questions for the Neonatal Board Examination. JAMA Pediatr. 2023; 177: 977-979 Crossref Scopus (4) Google Scholar ; Eriksen et al., 2023 Eriksen A.V. Moller S. Ryg J. Use of GPT-4 to Diagnose Complex Clinical Cases. NEJM AI. 2023; 1 Crossref Google Scholar ). Previous reports evaluated the performance of LLMs on dermatology practice board examinations questions ( Passby et al., 2023 Passby L. Jenko N. Wernham A. Performance of ChatGPT on dermatology Specialty Certificate Examination multiple choice questions [e-pub ahead of print]. Clin Exp Dermatol. 2023; https://doi.org/10.1093/ced/llad197 Crossref Google Scholar ; Joly-Chevrier et al., 2023 Joly-Chevrier M. Nguyen A.X. Lesko-Krleza M. Lefrançois P. Performance of ChatGPT on a Practice Dermatology Board Certification Examination. J Cutan Med Surg. 2023; 27: 407-409 Crossref Scopus (2) Google Scholar ; Mirza et al., 2024 Mirza F.N. Lim R.K. Yumeen S. Wahood S. Zaidat B. Shah A. et al. Performance of Three Large Language Models on Dermatology Board Examinations. J Invest Dermatol. 2024; 144: 398-400 Abstract Full Text Full Text PDF PubMed Scopus (1) Google Scholar ), but the performance of LLMs compared to practicing dermatologists has not been elucidated. In addition, the use of LLMs by dermatologists and patients relies on the quality of the models' prose (free text) responses rather than multiple-choice selections. Clinical reasoning in model prose responses remains poorly understood in the literature. We conducted the first study, to our knowledge, that assesses the specialty knowledge of a commonly used LLM compared to dermatologists and manually evaluates the quality of model prose responses by board-certified dermatologists.
科研通智能强力驱动
Strongly Powered by AbleSci AI