DDCTNet: A Deformable and Dynamic Cross Transformer Network for Road Extraction from High Resolution Remote Sensing Images

遥感计算机科学高分辨率变压器图像分辨率人工智能计算机视觉地质学工程类电压电气工程

作者

Lipeng Gao,Yiqing Zhou,Jiangtao Tian,Wenjing Cai

出处

期刊：IEEE Transactions on Geoscience and Remote Sensing [Institute of Electrical and Electronics Engineers]
日期：2024-01-01 卷期号：62: 1-19 被引量：1

链接

ieee.orgdoi.org

标识

DOI：10.1109/tgrs.2024.3404044

摘要

Influenced by the concepts of deep learning, extracting roads from high-resolution remote sensing scenes has gained significant attention. However, there are still limitations in both metrics and practical application scenarios. To address these limitations, we proposed a deformable and dynamic cross-transformer network (DDCTNet), introducing three key innovations. Firstly, we employed a deformable and dynamic cross-transformer (DDCT) attention module to enhance the recovery of data and structural information during the feature map upsampling by providing rich semantic information of encoding stage to decoding stage from spatial and channel dimensions, respectively, which improved the quality of upsampling while preserving the inherent characteristics of the road. Secondly, we introduced a cross-scale strip-pooling axial attention (CSSA) between discontinuous encoding stages to alleviate the information loss caused by down-sampling and highlight the linear characteristic of roads by leveraging rich semantic information from previous stage, which not only considers road linear features in complex scenes but also reduces computational complexity. Finally, we designed an auxiliary head (AuxHead) by fusing the outputs from the latter three decoding modules to enhance the model's generalization performance and convergence speed. Extensive experiments were conducted on three benchmark datasets. We also compared our DDCTNet with other classic road extraction models. The results show a noticeable improvement of 1%-5% across various evaluation metrics in three datasets. Additionally, the visualized results demonstrate that the proposed DDCTNet provides more accurate representations of real road scenes including distinguishing regions with high foreground-background similarity, addressing road occlusion, etc.

求助该文献

最长约 10秒，即可获得该文献文件

DDCTNet: A Deformable and Dynamic Cross Transformer Network for Road Extraction from High Resolution Remote Sensing Images

今日热心研友