计算机科学
推论
变压器
人工智能
计算
棱锥(几何)
深度学习
蒸馏
模式识别(心理学)
语义鸿沟
机器学习
算法
图像(数学)
图像检索
电压
化学
物理
光学
有机化学
量子力学
作者
Gao Chen,Wei‐Hua Wang,Xingjian Li
标识
DOI:10.1016/j.infrared.2023.104671
摘要
Large-scale transformer networks can achieve state-of-the-art infrared small target detection accuracy, but the high computational resource consumption makes their inference speed unsatisfactory. Existing lightweight networks can certainly achieve real-time detection of infrared small targets, but no matter how carefully designed the lightweight strategy is, there is still a gap in the accuracy of these networks compared to large-scale networks. To address these problems, in this paper, we propose a lightweight infrared small target detection network capable of high-speed inference and a knowledge distillation method for learning higher-order semantic information from the transformer network. Specifically, based on depthwise separable dilated convolution, we design each stage in the detection network, called multi-scale dilated pyramid network (MDPNet), as a multi-branch parallel pyramid. This design can enlarge the receptive field and enhance the ability to extract contextual features of the network. Furthermore, we utilize knowledge distillation to bridge the detection performance gap with the transformer. Based on the self-attention mechanism, a semantic distillation sub-network is constructed between the teacher transformer and student convolution network, which enables a more efficient cross-model transfer of knowledge about higher-order semantic feature extraction between networks with different mechanisms and increases the detection accuracy without computation load. We demonstrate the rationality and effectiveness of the overall network design and learning approach through exhaustive experiments. On the widely accepted public datasets SIRST and NUDT-SIRST, nIoU reaches 74.88 and 75.10, and PD reaches 99.08 and 97.88, outperforming other networks. With the number of parameters of only 0.23 M, the network achieves an inference speed of 137 FPS for 320 × 320 infrared images.
科研通智能强力驱动
Strongly Powered by AbleSci AI