C3TB-YOLOv5: integrated YOLOv5 with transformer for object detection in high-resolution remote sensing images

计算机视觉遥感计算机科学人工智能变压器高分辨率地理工程类电气工程电压

作者

Qinggang Wu,Yang Li,Wei Huang,Qiqiang Chen,Yonglei Wu

出处

期刊：International Journal of Remote Sensing [Informa]
日期：2024-04-03 卷期号：45 (8): 2622-2650 被引量：3

标识

DOI：10.1080/01431161.2024.2329528

摘要

In the realm of object detection from high-resolution remote sensing images (HRRSIs), the existing YOLOv5 methods encounter several challenges, including dense object arrangements, small object sizes, and complex backgrounds. To tackle these challenges, we propose a novel approach called C3TB-YOLOv5, which combines traditional YOLOv5 with the Transformer model to detect objects in HRRSIs. Unlike conventional YOLOv5 methods that primarily focus on capturing local information from remote sensing scenes, our C3TB-YOLOv5 method incorporates global information through the introduction of a new C3TB module. This module, based on the Transformer multi-head attention mechanism (AM), consists of two branches that extract local and global information from feature maps. By integrating these branches and establishing long-range relationships, our method successfully detects densely arranged small objects in HRRSIs. Furthermore, to improve the accuracy of tiny object detection, a novel detection head has been developed to effectively utilize the unused C3 module, thereby preventing the loss of fine-grained textures and positional features. In addition, we integrate an enhanced SimAM, namely Sim-GMP, into the model to adjust the focus across varying regions, effectively distinguishing the features of interested objects from complex backgrounds. Finally, to address the problem of sample imbalance in remote sensing object detection, the most recent Wise-IoU v3 loss function is employed to improve the accuracy of anchor box predictions for objects. To maintain a high object detection speed, the most critical C3 modules are substituted with the proposed C3TB module for the purpose of striking a good balance between object detection accuracy and model lightweight. Extensive experiments conducted on two remote sensing datasets of NWPU VHR-10 and VisDrone 2019 demonstrates that our method achieves superior object detection performance than state-of-the-art methods.

求助该文献

最长约 10秒，即可获得该文献文件

C3TB-YOLOv5: integrated YOLOv5 with transformer for object detection in high-resolution remote sensing images

今日热心研友