摘要
See editorial on page 332. See editorial on page 332. Screening colonoscopy is effective in reducing colorectal cancer risk but also represents a substantial financial burden.1Senore C. et al.Gut. 2019; 68: 1232-1244Crossref PubMed Scopus (85) Google Scholar,2Krzeczewski B. et al.Pol Arch Intern Med. 2021; 131: 128-135PubMed Google Scholar Novel strategies based on artificial intelligence (AI; computer-aided diagnosis [CADx]) may enable targeted removal only of polyps deemed to be neoplastic, thus reducing patient burden for unnecessary removal of non-neoplastic polyps and reducing costs for histopathology.3Hassan C. et al.Clin Gastroenterol Hepatol. 2022; 20: 2505-2513.e4Abstract Full Text Full Text PDF PubMed Scopus (15) Google Scholar, 4Rondonotti E. et al.Endoscopy. 2023; 55: 14-22Crossref PubMed Scopus (13) Google Scholar, 5Barua I. et al.NEJM Evid. 2022; 1Crossref Google Scholar, 6Weigt J. et al.Endoscopy. 2022; 54: 180-184Crossref PubMed Scopus (27) Google Scholar The American Society for Gastrointestinal Endoscopy recommends a threshold for optical diagnosis of at least 90% negative predictive value (NPV) for rectosigmoid neoplastic polyps ≤ 5 mm.7Rex D.K. et al.Gastrointest Endosc. 2011; 73: 419-422Abstract Full Text Full Text PDF PubMed Scopus (449) Google Scholar Several CADx systems for optical diagnosis of colorectal polyps are commercially available.6Weigt J. et al.Endoscopy. 2022; 54: 180-184Crossref PubMed Scopus (27) Google Scholar,8Biffi C. et al.NPJ Digit Med. 2022; 5: 84Crossref PubMed Scopus (9) Google Scholar Each CADx system has been trained and validated with different polyp datasets3Hassan C. et al.Clin Gastroenterol Hepatol. 2022; 20: 2505-2513.e4Abstract Full Text Full Text PDF PubMed Scopus (15) Google Scholar, 4Rondonotti E. et al.Endoscopy. 2023; 55: 14-22Crossref PubMed Scopus (13) Google Scholar, 5Barua I. et al.NEJM Evid. 2022; 1Crossref Google Scholar,9Mori Y. et al.Ann Intern Med. 2018; 169: 357-366Crossref PubMed Scopus (286) Google Scholar with little if any clinical information on these datasets. This variability may affect the clinical outcome of optical diagnosis–based strategies. Thus, we performed a head-to-head comparison trial to compare real-life performances of 2 commercially available CADx systems for the optical diagnosis of colorectal polyps. The COMBO-CAD study (Characterization cOMparison Between twO CAD systems; clinical trials.gov registration no. NCT05141409) is a prospective, head-to-head comparison trial of 2 commercially available CADx systems (CAD-EYE [Fujifilm Co, Tokyo, Japan] and GI-Genius [version 3.0.0; Medtronic]) conducted in a single academic endoscopy center in Italy (Humanitas Research Hospital, Milan). At colonoscopy, the same polyp was simultaneously visualized by the same endoscopist on 2 different monitors with the output of each of the 2 CADx systems separately. Pre- and post-CADx human diagnoses was also collected. From January 1, 2022 to March 31, 2022, 176 consecutive patients (60.8% men; mean age, 60.2 years) aged 40 or older undergoing colonoscopy for colorectal cancer screening, polypectomy surveillance, or gastrointestinal signs or symptoms were enrolled in the study (Figure 1A, Supplementary Methods, and Supplementary Table 1). Of 543 polyps that were detected and removed, 169 (31.3%) were adenomas and 373 (68.7%) were nonadenomas. Of these, 325 (59.9%) were rectosigmoid polyps ≤ 5 mm in diameter and thus eligible for analyses (44 adenomas [13.5%] and 281 nonadenomas [86.5%]). The 2 systems were grouped into CADx-A, the CAD-EYE system, and CADx-B, the GI-Genius system. CADx-A provided prediction output for all 325 rectosigmoid polyps ≤ 5 mm, whereas CADx-B was not able to provide output for 6 polyps (1.8%). These 6 polyps were excluded from the primary analysis, and 319 (44 adenomas and 275 nonadenomas) lesions were ultimately included. The NPV for rectosigmoid polyps ≤ 5 mm was 97.0% (95% confidence interval [CI], 95.0%–99.0%) with CADx-A and 97.7% (95% CI, 95.9%–99.5%) with CADx-B (rate ratio, 0.99; 95% CI, 0.98–1.01), and sensitivity for adenomas was 81.8% (95% CI, 70.4%–93.2%) and 86.4% (95% CI, 76.2%–96.5%), respectively (rate ratio: –0.05; 95% CI, –0.107 to 0.06; P = .157) (Supplementary Table 2). The accuracy of CADx-A was 93.2% compared with 91.5% of CADx-B (difference of proportion, 0.016; 95% CI, –0.013 to 0.046). Based on AI prediction alone, 269 of 319 polyps (84.3%) with CADx-A and 260 of 319 polyps (81.5%) with CADx-B would have been classified as non-neoplastic and thus would have avoided removal. This corresponded to a specificity of the 2 systems of 94.9% (95% CI, 92.3%–97.5%) and 92.4% (95% CI, 89.2%–95.5%), respectively, which was not significantly different. Concordance in histology prediction between the 2 AI systems was 94.7% (302/319; κ = 0.81; 95% CI, 0.73–0.90). Based on the 2020 US Multi-Society Task Force on Colorectal Cancer (USMSTF) guidelines,10Gupta S. et al.Gastrointest Endosc. 2020; 91: 463-485.e5Abstract Full Text Full Text PDF PubMed Scopus (101) Google Scholar the agreement with histopathology in surveillance interval assignment was 84.7% (149/176; 95% CI, 78.3%–89.5%) for CADx-A and 89.2% (157/176; 95% CI, 83.4%–93.3%) for CADx-B. When applying the 2020 European Society of Gastrointestinal Endoscopy (ESGE) guidelines as reference,11Hassan C. et al.Endoscopy. 2020; 52: 687-700Crossref PubMed Scopus (170) Google Scholar the agreement was 98.3% (173/176; 95% CI, 94.7%–99.6%) for both systems. For ≤5-mm rectosigmoid polyps, the NPV of unassisted optical diagnosis was 97.8% (95% CI, 96.0%–99.5) for a high-confident diagnosis but was not significantly different from the NPV of CADx-A (96.9%; 95% CI, 94.8%–99.0%) and CADx-B (97.6%; 95% CI, 95.8%–99.5%). The NPV of a CADx-assisted optical diagnosis for ≤5-mm rectosigmoid polyps (high-confidence) was 97.7% (95% CI, 96.0%–99.5%), without statistically significant differences as compared with unassisted interpretation (Figure 1B). Based on the 2020 USMSTF and 2020 ESGE guidelines, the agreement between unassisted interpretation and histopathology in surveillance interval assignment was 92.6% (163/176; 95% CI, 87.4%–95.9%) and 98.90% (174/176; 95% CI, 95.5%–99.8%), respectively. There was total agreement between unassisted interpretation and CADx-assisted interpretation in surveillance intervals assignment based on both the 2020 USMSTF and ESGE guidelines. According to our study, there is no variability in clinically relevant outcomes when an optical diagnosis of ≤5-mm colorectal polyps is performed by either of the 2 CADx systems, matching the cutoff required for the leave-in-situ and resect-and-discard strategies when adopting the European surveillance guidelines.11Hassan C. et al.Endoscopy. 2020; 52: 687-700Crossref PubMed Scopus (170) Google Scholar This clinical equivalence was achieved despite a different degree of concordance between the 2 CADx systems when analyzing separately distal and proximal lesions. Our study confirmed the somewhat unexpected findings of the 3 previous studies3Hassan C. et al.Clin Gastroenterol Hepatol. 2022; 20: 2505-2513.e4Abstract Full Text Full Text PDF PubMed Scopus (15) Google Scholar,4Rondonotti E. et al.Endoscopy. 2023; 55: 14-22Crossref PubMed Scopus (13) Google Scholar,9Mori Y. et al.Ann Intern Med. 2018; 169: 357-366Crossref PubMed Scopus (286) Google Scholar on the lack of benefit when passing from CADx-unassisted to CADx-assisted endoscopist diagnosis both in terms of technical accuracy and clinical outcomes. This was related to a very high performance of unassisted-endoscopist diagnosis as outlined by the 97.8% NPV for ≤5-mm rectosigmoid polyps and ≥90% concordance in postpolypectomy surveillance intervals with histology. In addition, our study shows the relevance of the level of confidence in optical diagnosis. A human endoscopist was the only 1 to achieve a ≥90% agreement in postpolypectomy surveillance intervals when adopting the American guidelines, mainly because of a very high specificity. This confirms the complexity of the human–machine interaction that should not be marginalized in the stand-alone performance of the machine. However, the very high accuracy by unassisted endoscopists in our academic center is unlikely to mirror the real performance in the community setting. Thus, new studies specifically focusing on nontertiary centers are needed to show the additional benefit if any of CADx for the leave in situ of colorectal polyps. The main strength of our study is the innovative methodology of in vivo intrapolyp comparisons between 2 CADx systems that provides a direct estimate of system concordance, excluding any difference in disease prevalence and operator bias that would occur in a parallel randomized trial. This dual methodology may be applied in the future to other tasks of AI, such as polyp identification or sizing. However, we could not prevent exposing the same operator to both systems so that a possible effect on the confidence of the observer cannot be excluded. However, it is unlikely that the automatic output of each CADx system could have been affected by such information. Second, this was designed as a noninferiority rather than an equivalence trial because the only information available at the time of the design was the 97.6% NPV of CADx-B so that the lower part of our 5% margin of noninferiority would have been superior to the 90% NPV cutoff required for clinical incorporation. A high degree of concordance in clinical outcomes was shown when directly comparing in vivo 2 different systems of CADx. This reassured our confidence in the standardization of performance that may be achieved with the incorporation of AI in clinical practice, irrespective of the availability of multiple systems. The COMBO Study Group includes Marco Spadaccini,1,2 Carmelo Selvaggio,1 Giulio Antonelli,3,4 Kareem Khalaf,1 Tommy Rizkala,1 Elisa Ferrara,1,2 Victor Savevski,5 Roberta Maselli,1,2 Alessandro Fugazza,1,2 Antonio Capogreco,1,2 Valeria Poletti,1,2 Silvia Ferretti,1,2 Asma Alkandari,6 Loredana Correale1,2; from the 1Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Italy; 2Endoscopy Unit, Humanitas Clinical and Research Hospital, IRCCS, Rozzano, Italy; 3Department of Anatomical, Histological, Forensic Medicine and Orthopaedics Sciences, "Sapienza" University of Rome, Italy; 4Gastroenterology and Digestive Endoscopy Unit, Ospedale dei Castelli Hospital, Ariccia, Rome, Italy; 5Artificial Intelligence Research, Humanitas Clinical and Research Center, IRCCS, Rozzano, Italy; and the 6Endoscopy Unit, Thanyan Alghanim Center for Gastroenterology and Hepatology, Alamiri Hospital, Kuwait, Kuwait. Cesare Hassan, MD (Data curation: Lead; Formal analysis: Lead; Funding acquisition: Lead; Investigation: Lead; Methodology: Lead; Software: Lead; Supervision: Lead; Validation: Lead; Visualization: Lead; Writing – original draft: Lead; Writing – review & editing: Lead). Prateek Sharma, MD (Supervision: Equal; Validation: Lead). Yuichi Mori, MD (Validation: Equal; Visualization: Equal). Michael Bretthauer, Professor of Medicine (Supervision: Equal; Validation: Equal; Visualization: Equal). Douglas K. Rex, MD (Supervision: Equal; Validation: Equal; Visualization: Equal). Alessandro Repici, MD (Conceptualization: Equal; Data curation: Equal; Funding acquisition: Equal; Investigation: Equal; Writing – original draft: Equal; Writing – review & editing: Equal). Briefly, the CAD-EYE system is a real-time convolutional neural network AI system developed for polyp characterization. It is used in the so-called blue-light mode. The system provides optical polyp diagnosis through polyp identification in the “location map,” colored brackets surrounding the endoscopic image, and prediction of hyperplastic or neoplastic (ie, adenoma or nonadenoma). The GI-Genius system is composed of 2 modules, 1 for polyp detection (computer-aided detection) and 1 for characterization (CADx). The computer-aided detection module has been described and clinically tested elsewhere. The CADx module is used in standard white-light mode and automatically activates when a lesion is visible on the colonoscopy screen. The module uses a convolutional neural network classifier that is run in real time on several images of the same lesion and provides a prediction of adenoma or nonadenoma. If a prediction cannot be reached, a “no prediction” output is provided. In the trial, the 2 systems were used with real-time output using the same endoscopy towers (ELUXEO TM VP-7000 video processor and ELUXEO TM BL7000 light source; Fujifilm Co, Tokyo, Japan) with 2 concomitant and independent outputs. Then, each system provided its real-time output on 2 different screens that could be indifferently used by the endoscopist (Figure 1). All patients underwent routine split-bowel preparation according to a standardized protocol used at the study center. Both experienced and nonexperienced endoscopists participated in the study. Experienced endoscopists had performed more than 2000 colonoscopies and were trained in optical diagnosis with blue-light imaging, whereas nonexperienced endoscopists were gastroenterology trainees who had not yet completed endoscopic training (<1000 colonoscopies with or without CADx assistance). Bowel preparation was evaluated and graded by the endoscopist performing the examination using the Boston Bowel Preparation scale. All detected polyps were cleaned for mucus, positioned at “6 o’clock” on the endoscopy screen if possible, and framed at the nearest possible distance while keeping focus. The endoscopist first categorized lesions as adenoma or nonadenoma by using white-light and blue-light imaging and scored diagnosis confidence level (high or low) without using the CADx AI systems. Then, AI systems were switched on and the output (adenoma vs nonadenoma) automatically provided by the two AI systems (CAD-EYE [CADx-A] and GI-Genius [CADx-B]) appeared separately on 2 different individual screens. Outputs of the 2 systems were recorded irrespectively of the previous endoscopist’s prediction and level of confidence by an endoscopist assistance. The first output to be recorded by GI-Genius in white light and the second by CAD-EYE in blue light. Next, the colonoscopist again categorized the polyp (adenoma or nonadenoma) and scored the confidence level of prediction as high or low. All polyps were then removed, placed in separate jars with 10% buffered formalin solution, and sent for histopathologic evaluation by 1 expert pathologist according to the Vienna classification. The pathologist was blinded to the diagnosis made by the AI system and the endoscopist. The primary study endpoint was the comparison of the NPV for adenomatous histology of the CADx optical diagnosis for diminutive (≤5 mm) rectosigmoid polyps with the 2 systems. Secondary endpoints were concordance and comparison in optical diagnosis performance of the 2 CADx systems for (≤5 mm) rectosigmoid polyps; concordance and comparison in optical diagnosis performance of the 2 CADx systems for all colorectal lesions; agreement in assignment of postpolypectomy surveillance intervals according to established guidelines (ESGE 20204Rondonotti E. et al.Endoscopy. 2023; 55: 14-22Crossref PubMed Scopus (13) Google Scholar and USMSTF 2020) between the assignment identified according to each CADx diagnosis, the combined CADx optical diagnosis for diminutive (≤5 mm) polyps and histology for larger polyps (≥6 mm), and the assignment identified according to histology alone regardless of lesion size; and optical diagnosis performance (sensitivity, specificity, accuracy, positive predictive value [PPV], and NPV) of the endoscopist after cumulative assistance by the 2 CADx systems. The study hypothesis is that CAD-EYE is noninferior to GI-Genius in terms of NPV for ≤5-mm rectosigmoid polyps based on a previous estimate of accuracy of GI-Genius. The prevalence of adenomatous diminutive polyps in the study sample was estimated to be 12.7% based on a previous analysis. In detail, the diagnostic performance of GI-Genius was estimated at a sensitivity of 82% and a specificity of 93.2%. Based on the prevalence, sensitivity, and specificity, NPV for GI Genius was calculated to be 97.6%. With the estimated NPV and a noninferiority margin of 5%, a sample size of 310 diminutive rectosigmoid polyps would achieve 80% power at a 5% significance level. This sample size was derived by using the equations described in Takahashi et al. For each pair-wise comparison, 2 × 2 contingency tables were used to present the results and calculate the diagnostic accuracy estimates with 95% CIs. The unit of assessment for our 2 × 2 contingency table for assessment of sensitivity, specificity, NPV, PPV, positive likelihood ratio, and negative likelihood ratio was the polyp. Thus, polyps within the same patients were considered as independent observations. Given the paired nature of the test results, agreement in histology prediction (nonadenomatous or adenomatous polyps) was analyzed by using McNemar's test. McNemar’s tests were also used for the head-to-head comparisons of sensitivity and specificity between the 2 different CADx systems and endoscopists’ assessments. For this reason, we excluded from primary analysis those polyps that were predicted by only 1 of the 2 CADx systems (per-protocol analysis). Results from McNemar’s test to compare sensitivities and specificities were presented as difference of proportions. Given that the PPV and NPV depend on prevalence of disease, a general estimating equation logistic regression model was used to compare the NPV and PPV. Agreement between the 2 CADx systems was assessed by calculating κ values. A 2-sided P < .05 was indicative of statistical significance. Surveillance interval agreement of CADx-assisted optical diagnosis with pathology-based management using ESGE and USMSTF recommendations was evaluated and presented as a proportion with a 95% CI. For the purpose of this analysis, any AI prediction for adenoma was considered as low-risk adenoma and any nonadenoma as hyperplastic. The primary analysis was performed excluding polyps lacking an AI diagnosis. In a secondary analysis, all polyps were analyzed assuming that nonadenomatous polyps without an AI diagnosis were false-positive diagnoses.Supplementary Table 1Patient and Polyp CharacteristicsCharacteristicsValuesPatients, n176 Age, y, mean (SD)60.2 (9.7) SexMale107 (60.8)Female69 (39.2) Indication for colonoscopyPositive fecal immunochemical test16 (9.1)Screening70 (39.8)Postpolypectomy surveillance56 (31.8)Clinical signs or symptoms34 (19.3)Polyps, n, total543 Polyp size1–5 mm478 (88.0)6–9 mm42 (7.7)≥10 mm23 (4.3) ≤5-mm rectosigmoid polyps, n325 MorphologyIp (pedunculated)2 (0.6)Is (sessile)66 (20.0)Isp (semipedunculated)0 (0)IIa (flat raised)223 (68.6)IIb (flat)11 (3.4) LocationSigmoid124 (38.2)Rectum201 (61.8) HistologyAdenoma44 (13.5)Nonadvanced adenoma43 (97.7)Advanced adenoma (high-grade dysplasia, villous)1 (2.3)Nonadenoma281 (86.8)Hyperplastic279 (99.3)Sessile serrated lesion2 (0.7)Traditional serrated adenoma0 (0.0)Inflammatory/normal mucosa0 (0.0)Not retrieved0 (0.0)Values are n (%) unless otherwise defined. Detailed data were provided for diminutive rectosigmoid polyps according to the primary endpoint. Additional data on other polyps are provided in Supplementary Table 2. Open table in a new tab Supplementary Table 2Comparison of Standalone Diagnostic Accuracy of Two Different CADx Systems (CAD-EYE [CADx-A] and GI-Genius [CADxB]) for Diagnosis of Diminutive Adenomas According to Colon Location of PolypCADx-ACADx-BRatio test ratioaResults from McNemar’s test to compare sensitivities and specificities are presented as difference of proportions.PRectosigmoid polyps ≤ 5 mmbExcluding polyps lacking an AI diagnosis. (n = 319) NPV97.0 (95.0–99.0)97.7 (95.9–99.5)0.99 (0.98–1.01).196 PPV72.0 (59.6–84.4)64.4 (52.2-76.6)1.12 (0.97–1.28).119 Sensitivity81.8 (70.4–93.2)86.4 (76.2–96.5)–0.05 (–0.107 to 0.06).157 Specificity94.9 (92.3–97.5)92.4 (89.2–95.5)0.03 (–0.002 to 0.053).071 Positive diagnostic likelihood ratio16.1 (9.50-27.3)11.3 (7.4–17.3)1.42 (0.91–2.30).125 Negative diagnostic likelihood ratio0.19 (0.10–0.36)0.15 (0.07–0.31)1.30 (0.87–1.94).203Polyps ≤5 mm proximal to the rectosigmoidbExcluding polyps lacking an AI diagnosis. (n = 149) NPV81.1 (70.6–91.7)93.2 (85.7–100.0)0.87 (0.75–1.01).062 PPV79.2 (71.0–87.3)80.0 (72.3–87.7)1.00 (0.95–1.06).929 Sensitivity88.5 (81.8–95.2)96.6 (92.7–100.0)–0.08 (–0.001 to –0.17).052 Specificity69.4 (57.9–80.8)66.1 (54.3–77.9)0.03 (–0.07 to 0.137).479 Positive diagnostic likelihood ratio2.9 (2.0–4.2)2.9 (2.0–4.0)1.01 (0.76–1.46).929 Negative diagnostic likelihood ratio0.17 (0.09–0.31)0.05 (0.02–0.16)3.18 (0.87–11.63).090Values are % (95% CI). Results from the general estimating equation logistic regression model to compare PPV and NPV are presented as relative predictive values. All ratios are presented as CADx-B relative to CADx-A.a Results from McNemar’s test to compare sensitivities and specificities are presented as difference of proportions.b Excluding polyps lacking an AI diagnosis. Open table in a new tab Values are n (%) unless otherwise defined. Detailed data were provided for diminutive rectosigmoid polyps according to the primary endpoint. Additional data on other polyps are provided in Supplementary Table 2. Values are % (95% CI). Results from the general estimating equation logistic regression model to compare PPV and NPV are presented as relative predictive values. All ratios are presented as CADx-B relative to CADx-A. Assessing the Level of Expertise of Endoscopists in Optical Diagnosis of Colorectal Polyps—Not Every Expert Is an ExpertGastroenterologyPreviewOptical diagnosis of diminutive (≤5 mm) colorectal polyps is important to select the right treatment and surveillance interval. Literature regarding developments in artificial intelligence–based computer-aided diagnosis systems (CADx) for optical diagnosis benchmark the diagnostic performances of such systems one-on-one with competence standards (Preservation and Incorporation of Valuable Endoscopic Innovations)1 and endoscopist performances. Hassan et al2 described the performance of 2 CADx systems in a head-to-head comparison for diminutive colorectal polyps. Full-Text PDF Artificial Intelligence–Aided Colonoscopy for Characterizing and Detecting Colorectal Polyps: Required, Nice to Have, or Overhyped?GastroenterologyVol. 164Issue 3PreviewIn recent years, there has been a litany of publications assessing the potential benefits of emerging artificial intelligence (AI) applications in improving select aspects of colonoscopy quality. This has especially been the case for tools to assist in polyp characterization (computer-aided diagnosis [CADx]) and detection (computer-aided detection [CADe]).1,2 Several high-quality studies confirmed excellent accuracy in predicting polyp pathology when using CADx for optical diagnosis. The overwhelming evidence from published randomized trials of CADe suggests an improvement in adenoma detection rate (ADR) that may be greater than any other endoscopic intervention. Full-Text PDF