文档
计算机科学
信号(编程语言)
数据科学
数据挖掘
情报检索
人工智能
操作系统
程序设计语言
作者
Maria Galanty,Dieuwertje Luitse,Sijm H. Noteboom,P M Croon,Alexander P. J. Vlaar,Thomas Poell,Clara I. Sánchez,Tobias Blanke,Ivana Išgum
标识
DOI:10.1038/s41598-024-83218-5
摘要
Medical datasets are vital for advancing Artificial Intelligence (AI) in healthcare. Yet biases in these datasets on which deep-learning models are trained can compromise reliability. This study investigates biases stemming from dataset-creation practices. Drawing on existing guidelines, we first developed a BEAMRAD tool to assess the documentation of public Magnetic Resonance Imaging (MRI); Color Fundus Photography (CFP), and Electrocardiogram (ECG) datasets. In doing so, we provide an overview of the biases that may emerge due to inadequate dataset documentation. Second, we examine the current state of documentation for public medical images and signal data. Our research reveals that there is substantial variance in the documentation of image and signal datasets, even though guidelines have been developed in medical imaging. This indicates that dataset documentation is subject to individual discretionary decisions. Furthermore, we find that aspects such as hardware and data acquisition details are commonly documented, while information regarding data annotation practices, annotation error quantification, or data limitations are not consistently reported. This risks having considerable implications for the abilities of data users to detect potential sources of bias through these respective aspects and develop reliable and robust models that can be adapted for clinical practice.
科研通智能强力驱动
Strongly Powered by AbleSci AI