作者
Stefan M. Niehues,Lisa C. Adams,Robert Gaudin,Christoph Erxleben,Sarah Keller,Marcus R. Makowski,Janis L. Vahldiek,Keno K. Bressem
摘要
Validation of deep learning models should separately consider bedside chest radiographs (CXRs) as they are the most challenging to interpret, while at the same time the resulting diagnoses are important for managing critically ill patients. Therefore, we aimed to develop and evaluate deep learning models for the identification of clinically relevant abnormalities in bedside CXRs, using reference standards established by computed tomography (CT) and multiple radiologists.In this retrospective study, a dataset consisting of 18,361 bedside CXRs of patients treated at a level 1 medical center between January 2009 and March 2019 was used. All included CXRs occurred within 24 hours before or after a chest CT. A deep learning algorithm was developed to identify 8 findings on bedside CXRs (cardiac congestion, pleural effusion, air-space opacification, pneumothorax, central venous catheter, thoracic drain, gastric tube, and tracheal tube/cannula). For the training dataset, 17,275 combined labels were extracted from the CXR and CT reports by a deep learning natural language processing (NLP) tool. In case of a disagreement between CXR and CT, human-in-the-loop annotations were used. The test dataset consisted of 583 images, evaluated by 4 radiologists. Performance was assessed by area under the receiver operating characteristic curve analysis, sensitivity, specificity, and positive predictive value.Areas under the receiver operating characteristic curve for cardiac congestion, pleural effusion, air-space opacification, pneumothorax, central venous catheter, thoracic drain, gastric tube, and tracheal tube/cannula were 0.90 (95% confidence interval [CI], 0.87-0.93; 3 radiologists on the receiver operating characteristic [ROC] curve), 0.95 (95% CI, 0.93-0.96; 3 radiologists on the ROC curve), 0.85 (95% CI, 0.82-0.89; 1 radiologist on the ROC curve), 0.92 (95% CI, 0.89-0.95; 1 radiologist on the ROC curve), 0.99 (95% CI, 0.98-0.99), 0.99 (95% CI, 0.98-0.99), 0.98 (95% CI, 0.97-0.99), and 0.99 (95% CI, 0.98-1.00), respectively.A deep learning model used specifically for bedside CXRs showed similar performance to expert radiologists. It could therefore be used to detect clinically relevant findings during after-hours and help emergency and intensive care physicians to focus on patient care.