期刊:IEEE Transactions on Geoscience and Remote Sensing [Institute of Electrical and Electronics Engineers] 日期:2023-01-01卷期号:61: 1-14被引量:14
标识
DOI:10.1109/tgrs.2023.3292112
摘要
High spatial resolution (HSR) remote sensing images inevitably pose the challenge of multi-scale transformation, as small objects such as cars and helicopters may occupy only a few pixel points. This incurs a significant hurdle for global context modeling, particularly in backbone networks with large downsampling coefficients. Simple summation or concatenation techniques, such as skip connections, fail to address semantic gaps and even impose negative impacts on multi-scale feature fusion. Meanwhile, due to the complexity of foreground objects, the boundary details of HSR remote sensing images are easy to lose in sampling operations. To overcome these challenges, we propose a Multi-scale Channel-wise Cross Attention Network (MCCANet) assisted by boundary supervision. Technically, MCCA captures the channel attention with various scales, which allows dynamic and adaptive feature fusion in a contextual scale-aware manner and focuses on both large and small objects distributed throughout the inputs. Besides, a Channel and Context Strainer (CCS) module is proposed and embedded in MCCA, filtering channels and contexts for the mitigation of intra-class differences. In addition, we apply a Boundary Supervision (BS) module to recover boundary contour, avoiding the blurring effect during the construction of contextual information. The refined boundary allows for the effective recognition of surrounding pixels, ensuring a better segmentation performance. Extensive experiments on iSAlD, ISPRS Potsdam, and LoveDA datasets demonstrate that our proposed MCCANet achieves a good balance of high accuracy and efficiency. Code will be available at: https://github.com/ZhengJianwei2/MCCANet.