作者
Lixin Gong,Min Wang,Lei Shu,Jie He,Bin Qin,Jiacheng Xu,Wei Su,Di Dong,Hao Hu,Jie Tian,Ping‐Hong Zhou
摘要
Background and Aims The detection rate for early gastric cancer (EGC) is unsatisfactory, and mastering the diagnostic skills of magnifying endoscopy with narrow-band imaging (ME-NBI) requires rich expertise and experience. We aimed to develop an EGC captioning model (EGCCap) to automatically describe the visual characteristics of ME-NBI images for endoscopists. Methods ME-NBI images (n = 1886) from 294 cases were enrolled from multiple centers, and corresponding 5658 text data were designed following the simple EGC diagnostic algorithm. An EGCCap was developed using the multiscale meshed-memory transformer. We conducted comprehensive evaluations for EGCCap including the quantitative and quality of performance, generalization, robustness, interpretability, and assistant value analyses. The commonly used metrics were BLEUs, CIDEr, METEOR, ROUGE, SPICE, accuracy, sensitivity, and specificity. Two-sided statistical tests were conducted, and statistical significance was determined when P < .05. Results EGCCap acquired satisfying captioning performance by outputting correctly and coherently clinically meaningful sentences in the internal test cohort (BLEU1 = 52.434, CIDEr = 36.734, METEOR = 27.823, ROUGE = 49.949, SPICE = 35.548) and maintained over 80% performance when applied to other centers or corrupted data. The diagnostic ability of endoscopists improved with the assistance of EGCCap, which was especially significant (P < .05) for junior endoscopists. Endoscopists gave EGCCap an average remarkable score of 7.182, showing acceptance of EGCCap. Conclusions EGCCap exhibited promising captioning performance and was proven with satisfying generalization, robustness, and interpretability. Our study showed potential value in aiding and improving the diagnosis of EGC and facilitating the development of automated reporting in the future. The detection rate for early gastric cancer (EGC) is unsatisfactory, and mastering the diagnostic skills of magnifying endoscopy with narrow-band imaging (ME-NBI) requires rich expertise and experience. We aimed to develop an EGC captioning model (EGCCap) to automatically describe the visual characteristics of ME-NBI images for endoscopists. ME-NBI images (n = 1886) from 294 cases were enrolled from multiple centers, and corresponding 5658 text data were designed following the simple EGC diagnostic algorithm. An EGCCap was developed using the multiscale meshed-memory transformer. We conducted comprehensive evaluations for EGCCap including the quantitative and quality of performance, generalization, robustness, interpretability, and assistant value analyses. The commonly used metrics were BLEUs, CIDEr, METEOR, ROUGE, SPICE, accuracy, sensitivity, and specificity. Two-sided statistical tests were conducted, and statistical significance was determined when P < .05. EGCCap acquired satisfying captioning performance by outputting correctly and coherently clinically meaningful sentences in the internal test cohort (BLEU1 = 52.434, CIDEr = 36.734, METEOR = 27.823, ROUGE = 49.949, SPICE = 35.548) and maintained over 80% performance when applied to other centers or corrupted data. The diagnostic ability of endoscopists improved with the assistance of EGCCap, which was especially significant (P < .05) for junior endoscopists. Endoscopists gave EGCCap an average remarkable score of 7.182, showing acceptance of EGCCap. EGCCap exhibited promising captioning performance and was proven with satisfying generalization, robustness, and interpretability. Our study showed potential value in aiding and improving the diagnosis of EGC and facilitating the development of automated reporting in the future.