摘要
The U-Net has achieved great successes in medical image segmentation. Most U-Nets follow the encoding-decoding-decision inference path, and propagate the features from encoding to decoding. However, the traditional approaches do not exploit the semantic differences among different organs and different image modalities, which are task-unaware and have limited generalization. To address these issues, this paper proposes a Coarse-Fine U-Net (CFU-Net) architecture with two embedded U-Nets, and designs a Multi-Level Attention Module (MLAM) to execute the multi-level information interaction. CFU-Net introduces an additional decoding path at lower level, which is formed as partly coupled two U-Nets with different depths, namely coarse U-Net and fine U-Net. Coarse U-Net obtains a coarse prediction which is then used to guide the decoding of fine U-Net. MLAM adjusts the features propagation in fine U-Net by exploiting the interactions of multi-level information, including decision information, contextual information, and long-range dependencies. In addition, CFU-Net is constructed using dynamic convolution to improve the adaptability of convolution. The performance of CFU-Net is evaluated on four different modalities datasets, including ISIC2018, BUSI, Kvasir-SEG, and LiTS. For the Dice/Intersection-over-Union (IoU) scores, CFU-Net obtains 0.82%/1.62%, 4.34%/6.89%, 5.23%/9.30%, and 5.11%/5.18% improvements over the state-of-the-art UNeXt on ISIC2018, BUSI, Kvasir-SEG, and LiTS datasets, respectively. Moreover, the superiority of CFU-Net on different modalities segmentation tasks can also demonstrate that our method has better generalization, which can be transferred into various disease diagnoses.