计算机科学
人工智能
编码器
像素
地点
变压器
模式识别(心理学)
算法
理论计算机科学
语言学
量子力学
操作系统
物理
哲学
电压
作者
Mingjin Zhang,Haichen Bai,Jing Zhang,Rui Zhang,Chaoyue Wang,Jie Guo,Xinbo Gao
标识
DOI:10.1145/3503161.3547817
摘要
Infrared small target detection (IRSTD) refers to segmenting the small targets from infrared images, which is of great significance in practical applications. However, due to the small scale of targets as well as noise and clutter in the background, current deep neural network-based methods struggle in extracting features with discriminative semantics while preserving fine details. In this paper, we address this problem by proposing a novel RKformer model with an encoder-decoder structure, where four specifically designed Runge-Kutta transformer (RKT) blocks are stacked sequentially in the encoder. Technically, it has three key designs. First, we adopt a parallel encoder block (PEB) of the transformer and convolution to take their advantages in long-range dependency modeling and locality modeling for extracting semantics and preserving details. Second, we propose a novel random-connection attention (RCA) block, which has a reservoir structure to learn sparse attention via random connections during training. RCA encourages the target to attend to sparse relevant positions instead of all the large-area background pixels, resulting in more informative attention scores. It has fewer parameters and computations than the original self-attention in the transformer while performing better. Third, inspired by neural ordinary differential equations (ODE), we stack two PEBs with several residual connections as the basic encoder block to implement the Runge-Kutta method for solving ODE, which can effectively enhance the feature and suppress noise. Experiments on the public NUAA-SIRST dataset and IRSTD-1k dataset demonstrate the superiority of the RKformer over state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI