激光雷达
遥感
特征提取
计算机科学
人工智能
特征(语言学)
卷积神经网络
传感器融合
计算机视觉
分割
模式识别(心理学)
数据挖掘
地质学
语言学
哲学
作者
Hui Luo,Xibo Feng,Bo Du,Yuxiang Zhang
标识
DOI:10.1109/tgrs.2024.3389110
摘要
Building extraction from remote sensing images is extremely important for urban planning, land-cover change analysis, disaster monitoring and so on. With the growing diversity in building features, shape, and texture, coupled with frequent occurrences of shadowing and occlusion, the use of high-resolution remote sensing image (HRI) alone has limitations in building extraction. Therefore, feature fusion using multisource data has gradually become one of the most popular. However, the unique characteristics and noise issues make it difficult to achieve effective fusion and utilization. So it is very challenging to realize the full fusion of multisource data to achieve complementary advantages. In this paper, we propose an end-to-end multimodal feature fusion building extraction network based on segformer, which utilizes the fusion of HRI and LiDAR data to realize the building extraction. Firstly, we utilize the segformer encoder to break through the limitations of the traditional convolutional neural network with restricted receptive field so as to achieve effective feature extraction of complex building. In addition, we propose a cross-modal feature fusion (CMFF) method utilizing the self-attention mechanism to ensure the fusion of multisource data. In the decoder part, we propose a multi-scale up-sampling decoder (MSUD) strategy to achieve full fusion of multi-level features. As demonstrated by experiments on three datasets, our model shows better performance than several multisource building extraction and semantic segmentation models. The IoU for buildings on the three datasets reach 91.80%, 93.03%, and 84.59%. Subsequent ablation experiments further validate the effectiveness of each strategy.
科研通智能强力驱动
Strongly Powered by AbleSci AI