A Multi-Scale Attention Framework for Automated Polyp Localization and Keyframe Extraction From Colonoscopy Videos

计算机科学人工智能结肠镜检查编码（集合论）深度学习比例（比率）金标准（测试）特征提取计算机视觉机器学习结直肠癌癌症放射科医学物理内科学集合（抽象数据类型）程序设计语言量子力学

作者

Vanshali Sharma,Pradipta Sasmal,M. K. Bhuyan,Pradip K. Das,Yuji Iwahori,Kunio Kasugai

出处

期刊：IEEE Transactions on Automation Science and Engineering [Institute of Electrical and Electronics Engineers]
日期：2023-10-02 卷期号：: 1-14 被引量：3

标识

DOI：10.1109/tase.2023.3315518

摘要

Colonoscopy video acquisition has been tremendously increased for retrospective analysis, comprehensive inspection, and detection of polyps to diagnose colorectal cancer (CRC). However, extracting meaningful clinical information from colonoscopy videos requires an enormous amount of reviewing time, which burdens the surgeons considerably. To reduce the manual efforts, we propose a first end-to-end automated multi-stage deep learning framework to extract an adequate number of clinically significant frames, i.e., keyframes from colonoscopy videos. The proposed framework comprises multiple stages that employ different deep learning models to select keyframes, which are high-quality, non-redundant polyp frames capturing multi-views of polyps. In one of the stages of our framework, we also propose a novel multi-scale attention-based model, YcOLOn, for polyp localization, which generates ROI and prediction scores crucial for obtaining keyframes. We further designed a GUI application to navigate through different stages. Extensive evaluation in real-world scenarios involving patient-wise and cross-dataset validations shows the efficacy of the proposed approach. The framework removes 96.3% and 94.02% frames, reduces detection processing time by 38.28% and 59.99%, and increases mAP by 2% and 5% on the SUN database and the CVC-VideoClinicDB, respectively. The source code is available at https://github.com/Vanshali/KeyframeExtraction Note to Practitioners —The widespread acceptance of colonoscopy procedures as a gold standard for CRC screening is constrained by the massive amount of data recorded during the process that needs to be manually reviewed. Such manual procedures are burdensome and induce human diagnostic errors. This article suggests an automated framework to extract keyframes (important frames) from colonoscopy videos that can efficiently represent the clinically relevant information captured in the video streams. This is achieved by the automated removal of uninformative and highly correlated frames, which do not add to clinical findings. The approach ensures diversity among keyframes and provides clinicians with a multi-view of polyps for easy resection. In addition, the proposed multi-scale attention-based model improves the polyp localization performance, which further helps in refining the keyframe selection process. The comprehensive experimental results corroborate that discarding insignificant frames can enhance polyp detection and localization performance and reduce computational requirements. The study estimates 30% to 60% time saving for clinicians during video screening. In clinical practices, the proposed automated framework and our designed GUI would enable surgeons to visualize the essential data better with minimal manual interventions and assist in precise polyp resection.

求助该文献

最长约 10秒，即可获得该文献文件

A Multi-Scale Attention Framework for Automated Polyp Localization and Keyframe Extraction From Colonoscopy Videos

今日热心研友