Towards High-accuracy and Real-time Two-stage Small Object Detection on FPGA

计算机科学现场可编程门阵列目标检测人工智能计算机视觉对象（语法）实时计算模式识别（心理学）嵌入式系统

作者

S Y Li,Zhenhua Zhu,Hanbo Sun,Xuefei Ning,Guohao Dai,Yiming Hu,Huazhong Yang,Yu Wang

出处

期刊：IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
日期：2024-04-09 卷期号：34 (9): 8053-8066

标识

DOI：10.1109/tcsvt.2024.3385121

摘要

Object detection via deep neural networks has undergone considerable advancements in recent years. Yet, the detection of smaller objects, specifically those with a few pixels (i.e., < 32 ² pixels), is still challenging compared with large objects (i.e., > 96 ² pixels). Existing methods commonly apply high-resolution features or complex super-resolution strategies based on the two-stage Faster Region Convolutional Neural Network (RCNN). They sequentially apply localization and classification stages after a shared feature map extracted by one single backbone network. However, these methods cause low detection accuracy of small objects, high computational overhead, and waste of hardware resources. In this paper, we develop a high-accuracy and real-time small object detection system with negligible computational overhead and low hardware idleness. At the software level, we propose a two-stage Coarse-to-Fine Decoupling RCNN (CFD RCNN) with three techniques: (1) The shared backbone decoupling for localization and classification to achieve high accuracy for both tasks; (2) The training method using backbone feature upsampling for localization with low computational overhead; (3) The object cropping strategy from the original high-resolution image for high-accuracy classification. At the hardware level, we propose a virtualized FPGA accelerator with the Dynamic Resource Allocation (DRA) strategy. The DRA strategy reallocates the hardware resources, considering the workload and resource preference of each stage in CFD RCNN to reduce hardware idleness. Extensive experiments on the TT100K and GTSDB datasets using Xilinx ZCU102 FPGA show that the proposed small object detection system can achieve 2.9% improvement in mean average precision (mAP) compared with state-of-the-art (SOTA) algorithms and raised the throughput from 18.9 FPS to > 26.0 FPS (~1.37×) compared with existing accelerators.

求助该文献

Towards High-accuracy and Real-time Two-stage Small Object Detection on FPGA

今日热心研友