Unified Spatial-Frequency Modeling and Alignment for Multi-Scale Small Object Detection

计算机科学比例（比率）对象（语法）人工智能遥感计算机视觉模式识别（心理学）地图学地理

作者

Jing Liu,Ying Wang,Yanyan Cao,Chaoping Guo,Peijun Shi,Pan Li

出处

期刊：Symmetry [Multidisciplinary Digital Publishing Institute]
日期：2025-02-06 卷期号：17 (2): 242-242

链接

doi.orgdoi.org

标识

DOI：10.3390/sym17020242

摘要

Small object detection in aerial imagery remains challenging due to sparse feature representation, limited spatial resolution, and complex background interference. Current deep learning approaches enhance detection performance through multi-scale feature fusion, leveraging convolutional operations to expand the receptive field or self-attention mechanisms for global context modeling. However, these methods primarily rely on spatial-domain features, while self-attention introduces high computational costs, and conventional fusion strategies (e.g., concatenation or addition) often result in weak feature correlation or boundary misalignment. To address these challenges, we propose a unified spatial-frequency modeling and multi-scale alignment fusion framework, termed USF-DETR, for small object detection. The framework comprises three key modules: the Spatial-Frequency Interaction Backbone (SFIB), the Dual Alignment and Balance Fusion FPN (DABF-FPN), and the Efficient Attention-AIFI (EA-AIFI). The SFIB integrates the Scharr operator for spatial edge and detail extraction and FFT/IFFT for capturing frequency-domain patterns, achieving a balanced fusion of global semantics and local details. The DABF-FPN employs bidirectional geometric alignment and adaptive attention to enhance the significance expression of the target area, suppress background noise, and improve feature asymmetry across scales. The EA-AIFI streamlines the Transformer attention mechanism by removing key-value interactions and encoding query relationships via linear projections, significantly boosting inference speed and contextual modeling. Experiments on the VisDrone and TinyPerson datasets demonstrate the effectiveness of USF-DETR, achieving improvements of 2.3% and 1.4% mAP over baselines, respectively, while balancing accuracy and computational efficiency. The framework outperforms state-of-the-art methods in small object detection.

求助该文献

Unified Spatial-Frequency Modeling and Alignment for Multi-Scale Small Object Detection

今日热心研友