Skin-lesions segmentation plays a prominent role in computer-aided diagnosis systems for skin cancer, especially the remarkable success of the convolutional neural network (CNN) approaches in skin-lesions segmentation. However, it faces intractable challenges such as variable shape and blurred skin lesions boundaries. To this end, past research has employed cutting-edge mechanisms, including diverse attention modules. Inspired by state-of-the-art works, this study proposed a Dual Encoder framework with a Text-Guided Attention Network (DETA-Net) which can accurately and efficiently segment various and blurred lesions. Firstly, we designed a multi-scale joint encoder that took the advantage of both the CNNs and Transformer to extract features under the blurred lesion background condition. In addition, we introduced text-guided attention to propel classification in the manner of text-based embedding in the DETA-Net so that the variation in the size and number of the lesion can be efficiently accommodated. Experimental results demonstrated that DETA-Net provided better performance across multiple datasets compared with state-of-the-art on variable-sized skin lesion datasets in Skin-Cancer detection. We also evaluated the effectiveness of DETA-Net through extensive ablation studies on three different datasets, including ISIC 2016, ISIC 2018, and PH2 datasets. The baseline achieved 0.8838 Dice on ISIC 2016, 0.8864 Dice on ISIC 2018, and 0.8695 Dice on PH2.