计算机科学
变压器
嵌入
分割
人工智能
图像分割
计算机视觉
一般化
模式识别(心理学)
数学
电气工程
数学分析
电压
工程类
作者
G. Jignesh Chowdary,Zhaozheng Yin
标识
DOI:10.1007/978-3-031-43901-8_59
摘要
Diffusion model has shown its power on various generation tasks. When applying the diffusion model in medical image segmentation, there are a few roadblocks to remove: the semantic features required for the conditioning of the diffusion process are not well aligned with the noise embedding; and the U-Net backbone employed in these diffusion models is not sensitive to contextual information that is essential during the reverse diffusion process for accurate pixel-level segmentation. To overcome these limitations, we present a cross-attention module to enhance the conditioning from source images, and a transformer based U-Net with multi-sized windows for the extraction of various scales of contextual information. Evaluated on five benchmark datasets with different imaging modalities including Kvasir-Seg, CVC Clinic DB, ISIC 2017, ISIC 2018, and Refuge, our diffusion transformer U-Net achieves great generalization ability and outperforms all the state-of-the-art models on these datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI