计算机科学
分割
人工智能
卷积神经网络
推论
深度学习
编码器
变压器
特征学习
图像分割
机器学习
模式识别(心理学)
计算机视觉
工程类
操作系统
电气工程
电压
作者
Libo Wang,Shenghui Fang,Ce Zhang,Rui Li,Chenxi Duan
出处
期刊:Cornell University - arXiv
日期:2021-09-18
被引量:2
摘要
Semantic segmentation of fine-resolution urban scene images plays a vital role in extensive practical applications, such as land cover mapping, urban change detection, environmental protection and economic assessment. Driven by rapid developments in deep learning technologies, the convolutional neural network (CNN) has dominated the semantic segmentation task for many years. Convolutional neural networks adopt hierarchical feature representation, demonstrating strong local information extraction. However, the local property of the convolution layer limits the network from capturing global context that is crucial for precise segmentation. Recently, Transformer comprise a hot topic in the computer vision domain. Transformer demonstrates the great capability of global information modelling, boosting many vision tasks, such as image classification, object detection and especially semantic segmentation. In this paper, we propose an efficient hybrid Transformer (EHT) for real-time urban scene segmentation. The EHT adopts a hybrid structure with and CNN-based encoder and a transformer-based decoder, learning global-local context with lower computation. Extensive experiments demonstrate that our EHT has faster inference speed with competitive accuracy compared with state-of-the-art lightweight models. Specifically, the proposed EHT achieves a 66.9% mIoU on the UAVid test set and outperforms other benchmark networks significantly. The code will be available soon.
科研通智能强力驱动
Strongly Powered by AbleSci AI