目标检测
人工智能
计算机科学
航空影像
计算机视觉
探测器
卷积神经网络
最小边界框
边距(机器学习)
变压器
模式识别(心理学)
图像(数学)
机器学习
工程类
电信
电压
电气工程
标识
DOI:10.1109/igarss52108.2023.10282836
摘要
The past few years have seen an increased interest in aerial image object detection due to its critical value to large-scale geoscientific research like environmental studies, urban planning, and intelligence monitoring. However, the task is very challenging due to the bird’s-eye view perspective, complex backgrounds, large and various image sizes, and the scarcity of well-annotated datasets. Recent advances in computer vision have shown promise tackling the challenge. Specifically, Vision Transformer Detector (ViTDet) was proposed to extract multi-scale features for object detection. The empirical study shows that ViTDet’s simple design achieves good performance on natural scene images and can be easily embedded into any detector architecture. To date, ViTDet’s potential benefit to challenging aerial image object detection has not been explored. As such, we carried out experiments to evaluate the effectiveness of ViTDet for aerial image object detection on three well-known datasets: Airbus Aircraft, RarePlanes, and Dataset of Object DeTection in Aerial images (DOTA). Our results show that ViTDet can consistently outperform its convolutional neural network counterparts on object detection by a large margin (up to 17% on average precision) and that it achieves the competitive performance for oriented bounding box (OBB) object detection.
科研通智能强力驱动
Strongly Powered by AbleSci AI