计算机科学
分割
人工智能
帧(网络)
模式识别(心理学)
背景(考古学)
注释
图像分割
计算机视觉
电信
古生物学
生物
作者
Xinkai Zhao,Wei Xing Zheng,Shuangyi Tan,De-Jun Fan,Zhen Li,Xiang Wan,Guanbin Li
标识
DOI:10.1007/978-3-031-16440-8_44
摘要
Deep learning-based polyp segmentation approaches have achieved great success in image datasets. However, the frame-by-frame annotation of polyp videos requires a large amount of workload, which limits the application of polyp segmentation algorithms in clinical videos. In this paper, we address the semi-supervised video polyp segmentation task, which requires only sparsely annotated frames to train a video polyp segmentation network. We propose a novel spatial-temporal attention network which is composed of Temporal Local Context Attention (TLCA) module and Proximity Frame Time-Space Attention (PFTSA) module. Specifically, TLCA module is to refine the prediction of the current frame using the prediction results of the nearby frames in the video clip. PFTSA module utilizes a simple yet powerful hybrid transformer architecture to capture long-range dependencies in time and space efficiently. Combined with consistency constraints, the network fuses representations of proximity frames at different scales to generate pseudo-masks for unlabeled images. We further propose a pseudo-mask-based training method. Additionally, we re-masked a subset of LDPolypVideo and applied it as a semi-supervised polyp segmentation dataset for our experiments. Experimental results show that our proposed semi-supervised approach can outperform existing image-level semi-supervised and fully supervised methods with sparse annotation at a speed of 135 fps. The code is available at github.com/ShinkaiZ/SSTAN .
科研通智能强力驱动
Strongly Powered by AbleSci AI