Artificial Intelligence for Urology Research: The Holy Grail of Data Science or Pandora’s Box of Misinformation?

人口入射（几何）医学误传全国健康与营养检查调查流行病学人口学病理计算机科学环境卫生数学计算机安全几何学社会学

作者

Ryan M. Blake,Johnathan A. Khusid

出处

期刊：Journal of Endourology [Mary Ann Liebert]
日期：2024-05-15

链接

nih.govdoi.org

标识

DOI：10.1089/end.2023.0703

摘要

Introduction Artificial intelligence tools such as the large language models (LLMs) Bard and ChatGPT have generated significant research interest. Utilization of these LLMs to study epidemiology of a target population could benefit urologists. We investigated whether Bard and ChatGPT can perform a large-scale calculation of the incidence and prevalence of kidney stone disease. Materials and Methods We obtained reference values from two published studies which used the National Health and Nutrition Examination Survey (NHANES) database to calculate the prevalence and incidence of kidney stone disease. We then tested the capability of Bard and ChatGPT to perform similar calculations using two different methods. First, we instructed the LLMs to access the datasets and independently perform the calculation. Second, we instructed the interfaces to generate customized computer code which could perform the calculation on downloaded datasets. Results While ChatGPT denied the ability to access and perform calculations on the NHANES database, Bard intermittently claimed the ability to do so. Bard provided either accurate results or inaccurate and inconsistent results. For example, Bard's "calculations" for the incidence of kidney stones from 2015-2018 were 2.1% (95% CI: 1.5-2.7), 1.75% (95% CI: 1.6-1.9), and 0.8% (95% CI 0.7-0.9), while the published number was 2.1% (95% CI 1.5–2.7). Bard provided discrete mathematical details of its calculations, however when prompted further, admitted to having obtained the numbers from online sources, including our chosen reference papers, rather than from a de novo calculation. Both LLMs were able to produce code (Python) to use on the downloaded NHANES datasets, however these would not readily execute. Conclusions ChatGPT and Bard are currently incapable of performing epidemiological calculations and lack transparency and accountability. Caution should be used, particularly with Bard, as claims of its capabilities were convincingly misleading, and results were inconsistent.

求助该文献

最长约 10秒，即可获得该文献文件

Artificial Intelligence for Urology Research: The Holy Grail of Data Science or Pandora’s Box of Misinformation?

今日热心研友