作者
Shadi Ebrahimian,Mannudeep K. Kalra,Sheela Agarwal,Bernardo C. Bizzo,Mona Elkholy,Christoph Wald,Bibb Allen,Keith J. Dreyer
摘要
To assess key trends, strengths, and gaps in validation studies of the Food and Drug Administration (FDA)-regulated imaging-based artificial intelligence/machine learning (AI/ML) algorithms.We audited publicly available details of regulated AI/ML algorithms in imaging from 2008 until April 2021. We reviewed 127 regulated software (118 AI/ML) to classify information related to their parent company, subspecialty, body area and specific anatomy type, imaging modality, date of FDA clearance, indications for use, target pathology (such as trauma) and findings (such as fracture), technique (CAD triage, CAD detection and/or characterization, CAD acquisition or improvement, and image processing/quantification), product performance, presence, type, strength and availability of clinical validation data. Pertaining to validation data, where available, we recorded the number of patients or studies included, sensitivity, specificity, accuracy, and/or receiver operating characteristic area under the curve, along with information on ground-truthing of use-cases. Data were analyzed with pivot tables and charts for descriptive statistics and trends.We noted an increasing number of FDA-regulated AI/ML from 2008 to 2021. Seventeen (17/118) regulated AI/ML algorithms posted no validation claims or data. Just 9/118 reviewed AI/ML algorithms had a validation dataset sizes of over 1000 patients. The most common type of AI/ML included image processing/quantification (IPQ; n = 59/118), and triage (CADt; n = 27/118). Brain, breast, and lungs dominated the targeted body regions of interest.Insufficient public information on validation datasets in several FDA-regulated AI/ML algorithms makes it difficult to justify clinical applications since their generalizability and presence of bias cannot be inferred.