计算机科学
卷积神经网络
分割
核(代数)
人工智能
变压器
模式识别(心理学)
卷积(计算机科学)
图像分割
人工神经网络
数学
量子力学
组合数学
物理
电压
作者
Weilin Liu,Lunqian Wang,Xinghua Wang,Hao Ding,Bo Xia,Zekai Zhang
标识
DOI:10.1109/iecon51785.2023.10312040
摘要
Since the advent of the Transformer, the Vision Transformer (ViT) model has become the optimal solution for extracting semantic information in remote sensing image semantic segmentation tasks. However, recent research has discovered that using larger convolutional kernels to extract global information can achieve performance comparable to, or even better than the ViT models. This finding inspires us to redesign the convolutional neural network structure and innovatively construct a novel parallel global-local module based on large kernel convolution, in order to better extract semantic information and local contextual information. Following this design approach, we propose ULKNet, a pure CNN architecture that achieves performance comparable to the ViT model. Through experiments on the ISPRS Vaihingen and Potsdam datasets, We respectively achieve mIoU 82.7% and 86.0% mIoU scores. This finding demonstrates that a well-designed large kernel CNN network structure can provide a larger and more effective receptive field, thereby better extracting semantic information.
科研通智能强力驱动
Strongly Powered by AbleSci AI