Designing and learning a lightweight network for infrared small target detection via dilated pyramid and semantic distillation

计算机科学推论变压器人工智能计算棱锥（几何）深度学习蒸馏模式识别（心理学）语义鸿沟机器学习算法图像（数学）图像检索电压化学物理光学有机化学量子力学

作者

Gao Chen,Wei‐Hua Wang,Xingjian Li

出处

期刊：Infrared Physics & Technology [Elsevier BV]
日期：2023-03-29 卷期号：131: 104671-104671 被引量：2

标识

DOI：10.1016/j.infrared.2023.104671

摘要

Large-scale transformer networks can achieve state-of-the-art infrared small target detection accuracy, but the high computational resource consumption makes their inference speed unsatisfactory. Existing lightweight networks can certainly achieve real-time detection of infrared small targets, but no matter how carefully designed the lightweight strategy is, there is still a gap in the accuracy of these networks compared to large-scale networks. To address these problems, in this paper, we propose a lightweight infrared small target detection network capable of high-speed inference and a knowledge distillation method for learning higher-order semantic information from the transformer network. Specifically, based on depthwise separable dilated convolution, we design each stage in the detection network, called multi-scale dilated pyramid network (MDPNet), as a multi-branch parallel pyramid. This design can enlarge the receptive field and enhance the ability to extract contextual features of the network. Furthermore, we utilize knowledge distillation to bridge the detection performance gap with the transformer. Based on the self-attention mechanism, a semantic distillation sub-network is constructed between the teacher transformer and student convolution network, which enables a more efficient cross-model transfer of knowledge about higher-order semantic feature extraction between networks with different mechanisms and increases the detection accuracy without computation load. We demonstrate the rationality and effectiveness of the overall network design and learning approach through exhaustive experiments. On the widely accepted public datasets SIRST and NUDT-SIRST, nIoU reaches 74.88 and 75.10, and PD reaches 99.08 and 97.88, outperforming other networks. With the number of parameters of only 0.23 M, the network achieves an inference speed of 137 FPS for 320 × 320 infrared images.

求助该文献

Designing and learning a lightweight network for infrared small target detection via dilated pyramid and semantic distillation

今日热心研友