计算机科学
图像分割
人工智能
接头(建筑物)
解析
计算机视觉
最小边界框
冗余(工程)
分割
跳跃式监视
医学影像学
点(几何)
图像(数学)
稳健性(进化)
语义学(计算机科学)
自然语言处理
模式识别(心理学)
可视化
启发式
特征提取
目标检测
医学诊断
机器学习
情报检索
文本检测
作者
Xu Zhang,Huangxuan Zhao,Lefei Zhang,Yuan Xiong
标识
DOI:10.1109/jbhi.2025.3607023
摘要
The Segment Anything Model (SAM) has attracted considerable attention due to its impressive performance and demonstrates potential in medical image segmentation. Compared to SAM's native point andbounding box prompts, text prompts offer a simpler and more efficient alternative in the medical field, yet this approach remains relatively underexplored. In this paper, we propose a SAM-based framework that integrates a pre-trained vision-language model to generate referring prompts, with SAM handling the segmentation task. The outputs from multimodal models such as CLIP serve as input to SAM's prompt encoder. A critical challenge stems from the inherent complexity of medical text descriptions: they typically encompass anatomical characteristics, imaging modalities, and diagnostic priorities, resulting in information redundancy and semantic ambiguity. To address this, we propose a text decomposition-recomposition strategy. First, clinical narratives are parsed into atomic semantic units (appearance, location, pathology, and so on). These elements are then recombined into optimized text expressions. We employ a cross-attention module among multiple texts to interact with the joint features, ensuring that the model focuses on features corresponding to effective descriptions. To validate the effectiveness of our method, we conducted experiments on several datasets. Compared to the native SAM based on geometric prompts, our model shows improved performance and usability.
科研通智能强力驱动
Strongly Powered by AbleSci AI