Lightweight Semantic Segmentation Network for Semantic Scene Understanding on Low-Compute Devices

计算机科学卷积（计算机科学）水准点（测量）分割集合（抽象数据类型）机器人卷积神经网络人工智能移动机器人人工神经网络试验装置语义学（计算机科学）算法程序设计语言大地测量学地理

作者

Hojun Son,James D. Weiland

标识

DOI：10.1109/iros55552.2023.10342110

摘要

Semantic scene understanding is beneficial for mobile robots. Semantic information obtained through onboard cameras can improve robots' navigation performance. However, obtaining semantic information on small mobile robots with constrained power and computation resources is challenging. We propose a new lightweight convolution neural network comparable to previous semantic segmentation algorithms for mobile applications. Our network achieved 73.06% on the Cityscapes validation set and 71.8% on the Cityscapes test set. Our model runs at 116 fps with $\mathbf{1024\mathrm{x}2048}$ , 172 fps with $1024\mathrm{x}1024$ , and 175 fps with $720\mathrm{x}960$ on NVIDIA GTX 1080. We analyze a model size, which is defined as the summation of the number of floating operations and the number of parameters. The smaller model size enables tiny mobile robot systems that should operate multiple tasks simultaneously to work efficiently. Our model has the smallest model size compared to the real-time semantic segmentation convolution neural networks ranked on Cityscapes real-time benchmark and other high performing, lightweight convolution neural networks. On the Camvid test set, our model achieved a mIoU of 73.29% with Cityscapes pre-training, which outperformed the accuracy of other lightweight convolution neural networks. For mobile applicability, we measured frame-per-second on different low-compute devices. Our model operates 35 fps on Jetson Xavier AGX, 21 FPS on Jetson Xavier NX, and 14 FPS on a ROS ASUS gaming phone. $1024\mathrm{x}2048$ resolution is used for the Jetson devices, and $512\mathrm{x}512$ size is utilized for the measurement on the phone. Our network did not use extra datasets such as ImageNet, Coarse Cityscapes, and Mapillary. Additionally, we did not use TensorRT to achieve fast inference speed. Compared to other real-time and lightweight CNNs, our model achieved significantly more efficiency while balancing accuracy, inference speed, and model size.

求助该文献

最长约 10秒，即可获得该文献文件

Lightweight Semantic Segmentation Network for Semantic Scene Understanding on Low-Compute Devices

今日热心研友