ABSTRACT Breast cancer remains one of the most significant health threats to women, making precise segmentation of target tumors critical for early clinical intervention and postoperative monitoring. While numerous convolutional neural networks (CNNs) and vision transformers have been developed to segment breast tumors from ultrasound images, both architectures encounter difficulties in effectively modeling long‐range dependencies, which are essential for accurate segmentation. Drawing inspiration from the Mamba architecture, we introduce the Vision Mamba‐CNN U‐Net (VMC‐UNet) for breast tumor segmentation. This innovative hybrid framework merges the long‐range dependency modeling capabilities of Mamba with the detailed local representation power of CNNs. A key feature of our approach is the implementation of a residual connection method within the U‐Net architecture, utilizing the visual state space (VSS) module to extract long‐range dependency features from convolutional feature maps effectively. Additionally, to better integrate texture and structural features, we have designed a bilinear multi‐scale attention module (BMSA), which significantly enhances the network's ability to capture and utilize intricate feature details across multiple scales. Extensive experiments conducted on three public datasets demonstrate that the proposed VMC‐UNet surpasses other state‐of‐the‐art methods in breast tumor segmentation, achieving Dice coefficients of 81.52% for BUSI, 88.00% for BUS, and 88.96% for STU. The source code is accessible at https://github.com/windywindyw/VMC‐UNet .