Enhanced multi-scale networks for semantic segmentation

分割计算机科学人工智能像素模式识别（心理学）特征（语言学）相似性（几何）背景（考古学）尺度空间分割频道（广播）比例（比率）图像分割图像（数学）古生物学计算机网络哲学语言学物理量子力学生物

作者

Tianping Li,Zhaotong Cui,Han Yu,Guanxing Li,Meng Li,Dongmei Wei

出处

期刊：Complex & Intelligent Systems 日期：2023-12-04 卷期号：10 (2): 2557-2568 被引量：4

链接

springer.com springer.comdoi.org

标识

DOI：10.1007/s40747-023-01279-x

摘要

Abstract Multi-scale representation provides an effective answer to the scale variation of objects and entities in semantic segmentation. The ability to capture long-range pixel dependency facilitates semantic segmentation. In addition, semantic segmentation necessitates the effective use of pixel-to-pixel similarity in the channel direction to enhance pixel areas. By reviewing the characteristics of earlier successful segmentation models, we discover a number of crucial elements that enhance segmentation model performance, including a robust encoder structure, multi-scale interactions, attention mechanisms, and a robust decoder structure. The attention mechanism of the asymmetric non-local neural network (ANNet) is merged with multi-scale pyramidal modules to accelerate model segmentation while maintaining high accuracy. However, ANNet does not account for the similarity between pixels in the feature map channel direction, making the segmentation accuracy unsatisfactory. As a result, we propose EMSNet, a straightforward convolutional network architecture for semantic segmentation that consists of Integration of enhanced regional module (IERM) and Multi-scale convolution module (MSCM). The IERM module generates weights using four or five-stage feature maps, then fuses the input features with the weights and uses more computation. The similarity of the channel direction feature graphs is also calculated using ANNet’s auxiliary loss function. The MSCM module can more accurately describe the interactions between various channels, capture the interdependencies between feature pixels, and capture the multi-scale context. Experiments prove that we perform well in tests using the benchmark dataset. On Cityscapes test data, we get 82.2% segmentation accuracy. The mIoU in the ADE20k and Pascal VOC datasets are, respectively, 45.58% and 85.46%.

求助该文献

最长约 10秒，即可获得该文献文件

Enhanced multi-scale networks for semantic segmentation

今日热心研友