计算机科学
分割
卷积神经网络
人工智能
变压器
图像分割
计算机视觉
模式识别(心理学)
量子力学
物理
电压
作者
Bing Liu,Huizhu Wu,Xueliang Bao,Zhaohao Zhong
标识
DOI:10.1109/cvidl58838.2023.10167334
摘要
Semantic segmentation, a fundamental task in computer vision, has developed rapidly in recent years. Semantic segmentation of remote sensing urban scene images, utilized in tasks such as land cover mapping, urban change detection, environmental preservation, and economic assessment, has also received much attention. In order to extract global semantic features, recent research has focused on combining transformers with CNN, which have great potential in global information modeling, for semantic segmentation models. However, the hybrid method of CNN and Vision Transformer still suffers from low latency. And the MLP of the visual transformer results in many parameters. To address these issues, we propose a lightweight pure CNN UNet(LPCUNet) model, which introduces a sizeable convolutional kernel to capture the global context. Moreover, a simple fusion module dynamically fuses local and global features. Extensive experiments show that our proposed method achieves state-of-the-art performance while faster latency. More precisely, the LPCUNet model demonstrated impressive performance with 83.3% and 86.4% mean Intersection over Union (mIoU) scores on the Vaihingen and Potsdam datasets respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI