计算机科学
人工智能
编码器
分割
卷积神经网络
解码方法
计算机视觉
模式识别(心理学)
变压器
特征提取
联营
图像分割
合并(版本控制)
算法
物理
量子力学
电压
情报检索
操作系统
作者
Wang Zhong-chen,Min Xia,Liguo Weng,Kai Hu,Haifeng Lin
标识
DOI:10.1109/jstars.2023.3347595
摘要
Although the vision transformer-based methods (ViTs) exhibit excellent performance than convolutional neural networks (CNNs) for image recognition tasks, their pixel-level semantic segmentation ability is limited due to the lack of explicit utilization of local biases. Recently, a variety of hybrid structures of ViT and CNN have been proposed, but these methods have poor multi-scale fusion ability and cannot accurately segment high-resolution and high-content complex land cover remote sensing images. Therefore, a dual encoder-decoder network named DEDNet is proposed in this work. In the encoding stage, the local and global information of the image is extracted by parallel CNN encoder and Transformer encoder. In the decoding stage, the cross-stage fusion (CF) module is constructed to achieve neighborhood attention guidance to enhance the positioning of small targets, effectively avoiding intra-class inconsistency. At the same time, the multi-head feature extraction (MFE) module is proposed to strengthen the recognition ability of the target boundary and effectively avoid inter-class ambiguity. Before outputting, the fusion spatial pyramid pooling (FSPP) classifier is proposed to merge the outputs of the two decoding strategies. The experiments demonstrate that the proposed model has superior generalization performance and can handle various semantic segmentation tasks of land cover.
科研通智能强力驱动
Strongly Powered by AbleSci AI