作者
Helen Y Yang,Brian Leahy,Won-Dong Jang,Donglai Wei,Yael Kalma,Roni Rahav,Ariella Carmon,Rotem Kopel,Foad Azem,Marta Venturas,Colm P. Kelleher,Liz Cam,Hanspeter Pfister,Daniel Needleman,Dalit Ben‐Yosef
摘要
Abstract STUDY QUESTION Can the BlastAssist deep learning pipeline perform comparably to or outperform human experts and embryologists at measuring interpretable, clinically relevant features of human embryos in IVF? SUMMARY ANSWER The BlastAssist pipeline can measure a comprehensive set of interpretable features of human embryos and either outperform or perform comparably to embryologists and human experts in measuring these features, WHAT IS KNOWN ALREADY Some studies have applied deep learning and developed ‘black-box’ algorithms to predict embryo viability directly from microscope images and videos but these lack interpretability and generalizability. Other studies have developed deep learning networks to measure individual features of embryos but fail to conduct careful comparisons to embryologists’ performance, which are fundamental to demonstrate the network’s effectiveness. STUDY DESIGN, SIZE, DURATION We applied the BlastAssist pipeline to 67 043 973 images (32 939 embryos) recorded in the IVF lab from 2012 to 2017 in Tel Aviv Sourasky Medical Center. We first compared the pipeline measurements of individual images/embryos to manual measurements by human experts for sets of features, including: (i) fertilization status (n = 207 embryos), (ii) cell symmetry (n = 109 embryos), (iii) degree of fragmentation (n = 6664 images), and (iv) developmental timing (n = 21 036 images). We then conducted detailed comparisons between pipeline outputs and annotations made by embryologists during routine treatments for features, including: (i) fertilization status (n = 18 922 embryos), (ii) pronuclei (PN) fade time (n = 13 781 embryos), (iii) degree of fragmentation on Day 2 (n = 11 582 embryos), and (iv) time of blastulation (n = 3266 embryos). In addition, we compared the pipeline outputs to the implantation results of 723 single embryo transfer (SET) cycles, and to the live birth results of 3421 embryos transferred in 1801 cycles. PARTICIPANTS/MATERIALS, SETTING, METHODS In addition to EmbryoScope™ image data, manual embryo grading and annotations, and electronic health record (EHR) data on treatment outcomes were also included. We integrated the deep learning networks we developed for individual features to construct the BlastAssist pipeline. Pearson’s χ2 test was used to evaluate the statistical independence of individual features and implantation success. Bayesian statistics was used to evaluate the association of the probability of an embryo resulting in live birth to BlastAssist inputs. MAIN RESULTS AND THE ROLE OF CHANCE The BlastAssist pipeline integrates five deep learning networks and measures comprehensive, interpretable, and quantitative features in clinical IVF. The pipeline performs similarly or better than manual measurements. For fertilization status, the network performs with very good parameters of specificity and sensitivity (area under the receiver operating characteristics (AUROC) 0.84–0.94). For symmetry score, the pipeline performs comparably to the human expert at both 2-cell (r = 0.71 ± 0.06) and 4-cell stages (r = 0.77 ± 0.07). For degree of fragmentation, the pipeline (acc = 69.4%) slightly under-performs compared to human experts (acc = 73.8%). For developmental timing, the pipeline (acc = 90.0%) performs similarly to human experts (acc = 91.4%). There is also strong agreement between pipeline outputs and annotations made by embryologists during routine treatments. For fertilization status, the pipeline and embryologists strongly agree (acc = 79.6%), and there is strong correlation between the two measurements (r = 0.683). For degree of fragmentation, the pipeline and embryologists mostly agree (acc = 55.4%), and there is also strong correlation between the two measurements (r = 0.648). For both PN fade time (r = 0.787) and time of blastulation (r = 0.887), there’s strong correlation between the pipeline and embryologists. For SET cycles, 2-cell time (P < 0.01) and 2-cell symmetry (P < 0.03) are significantly correlated with implantation success rate, while other features showed correlations with implantation success without statistical significance. In addition, 2-cell time (P < 5 × 10−11), PN fade time (P < 5 × 10−10), degree of fragmentation on Day 3 (P < 5 × 10−4), and 2-cell symmetry (P < 5 × 10−3) showed statistically significant correlation with the probability of the transferred embryo resulting in live birth. LIMITATIONS, REASONS FOR CAUTION We have not tested the BlastAssist pipeline on data from other clinics or other time-lapse microscopy (TLM) systems. The association study we conducted with live birth results do not take into account confounding variables, which will be necessary to construct an embryo selection algorithm. Randomized controlled trials (RCT) will be necessary to determine whether the pipeline can improve success rates in clinical IVF. WIDER IMPLICATIONS OF THE FINDINGS BlastAssist provides a comprehensive and holistic means of evaluating human embryos. Instead of using a black-box algorithm, BlastAssist outputs meaningful measurements of embryos that can be interpreted and corroborated by embryologists, which is crucial in clinical decision making. Furthermore, the unprecedentedly large dataset generated by BlastAssist measurements can be used as a powerful resource for further research in human embryology and IVF. STUDY FUNDING/COMPETING INTEREST(S) This work was supported by Harvard Quantitative Biology Initiative, the NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard (award number 1764269), the National Institute of Heath (award number R01HD104969), the Perelson Fund, and the Sagol fund for embryos and stem cells as part of the Sagol Network. The authors declare no competing interests. TRIAL REGISTRATION NUMBER Not applicable.