Cascaded Inner-Outer Clip Retformer for Ultrasound Video Object Segmentation

计算机视觉计算机科学人工智能分割图像分割对象（语法）放射科医学

作者

Jialu Li,Lei Zhu,Zhaohu Xing,Baoliang Zhao,Ying Hu,Faqin Lv,Q. Wang

出处

期刊：IEEE Journal of Biomedical and Health Informatics [Institute of Electrical and Electronics Engineers]
日期：2024-01-01 卷期号：: 1-16

链接

nih.govdoi.org

标识

DOI：10.1109/jbhi.2024.3464732

摘要

Computer-aided ultrasound (US) imaging is an important prerequisite for early clinical diagnosis and treatment. Due to the harsh ultrasound (US) image quality and the blurry tumor area, recent memory-based video object segmentation models (VOS) achieve frame-level segmentation by performing intensive similarity matching among the past frames which could inevitably result in computational redundancy. Furthermore, the current attention mechanism utilized in recent models only allocates the same attention level among whole spatial-temporal memory features without making distinctions, which may result in accuracy degradation. In this paper, we first build a larger annotated benchmark dataset for breast lesion segmentation in ultrasound videos, then we propose a lightweight clip-level VOS framework for achieving higher segmentation accuracy while maintaining the speed. The Inner-Outer Clip Retformer is proposed to extract spatialtemporal tumor features in parallel. Specifically, the proposed Outer Clip Retformer extracts the tumor movement feature from past video clips to locate the current clip tumor position, while the Inner Clip Retformer detailedly extracts current tumor features that can produce more accurate segmentation results. Then a Clip Contrastive loss function is further proposed to align the extracted tumor features along both the spatial-temporal dimensions to improve the segmentation accuracy. In addition, the Global Retentive Memory is proposed to maintain the complementary tumor features with lower computing resources which can generate coherent temporal movement features. In this way, our model can significantly improve the spatial-temporal perception ability without increasing a large number of parameters, achieving more accurate segmentation results while maintaining a faster segmentation speed. Finally, we conduct extensive experiments to evaluate our proposed model on several video object segmentation datasets, the results show that our framework outperforms state-of-theart segmentation methods.

求助该文献

最长约 10秒，即可获得该文献文件

Cascaded Inner-Outer Clip Retformer for Ultrasound Video Object Segmentation

今日热心研友