Echocardiography is an essential diagnostic method to assess cardiac functions. However, manually labelling the left ventricle region on echocardiography images is time-consuming and subject to observer bias. Therefore, it is vital to develop a high-performance and efficient automatic assessment tool. Inspired by the success of the transformer structure in vision tasks, we develop a lightweight model named ‘TransBridge’ for segmentation tasks. This hybrid framework combines a convolutional neural network (CNN) encoder-decoder structure and a transformer structure. The transformer layers bridge the CNN encoder and decoder to fuse the multi-level features extracted by the CNN encoder, to build global and inter-level dependencies. A new patch embedding layer has been implemented using the dense patch division method and shuffled group convolution to reduce the excessive parameter number in the embedding layer and the size of the token sequence. The model is evaluated on the EchoNet-Dynamic dataset for the left ventricle segmentation task. The experimental results show that the total number of parameters is reduced by 78.7% compared to CoTr [22] and the Dice coefficient reaches 91.4%, proving the structure’s effectiveness.