作者
Lennart Schmidt,Oliver Rieger,Mark Neznansky,Max Hackelöer,Lisa Antonia Dröge,Wolfgang Henrich,David Higgins,Stefan Verlohren
摘要
Preeclampsia presents a highly prevalent burden on pregnant women with an estimated incidence of 2% to 5%. Preeclampsia increases the maternal risk of death 20-fold and is one of the main causes of perinatal morbidity and mortality. Novel biomarkers, such as soluble fms-like tyrosine kinase-1 and placental growth factor in addition to a wide span of conventional clinical data (medical history, physical symptoms, laboratory parameters, etc.), present an excellent basis for the application of early-detection machine-learning models.This study aimed to develop, train, and test an automated machine-learning model for the prediction of adverse outcomes in patients with suspected preeclampsia.Our real-world dataset of 1647 (2472 samples) women was retrospectively recruited from women who presented to the Department of Obstetrics at the Charité - Universitätsmedizin Berlin, Berlin, Germany, between July 2010 and March 2019. After standardization and data cleaning, we calculated additional features regarding the biomarkers soluble fms-like tyrosine kinase-1 and placental growth factor and sonography data (umbilical artery pulsatility index, middle cerebral artery pulsatility index, mean uterine artery pulsatility index), resulting in a total of 114 features. The target metric was the occurrence of adverse outcomes throughout the remaining pregnancy and 2 weeks after delivery. We trained 2 different models, a gradient-boosted tree and a random forest classifier. Hyperparameter training was performed using a grid search approach. All results were evaluated via a 10 × 10-fold cross-validation regimen.We obtained metrics for the 2 naive machine-learning models. A gradient-boosted tree model was performed with a positive predictive value of 88%±6%, a negative predictive value of 89%±3%, a sensitivity of 66%±5%, a specificity of 97%±2%, an overall accuracy of 89%±3%, an area under the receiver operating characteristic curve of 0.82±0.03, an F1 score of 0.76±0.04, and a threat score of 0.61±0.05. The random forest classifier returned an equal positive predictive value (88%±6%) and specificity (97%±1%) while performing slightly inferior on the other available metrics. Applying differential cutoffs instead of a naive cutoff for positive prediction at ≥0.5 for the classifier's results yielded additional increases in performance.Machine-learning techniques were a valid approach to improve the prediction of adverse outcomes in pregnant women at high risk of preeclampsia vs current clinical standard techniques. Furthermore, we presented an automated system that did not rely on manual tuning or adjustments.