计算机科学
目标检测
特征(语言学)
人工智能
棱锥(几何)
编码器
模式识别(心理学)
沙漏
特征提取
卷积(计算机科学)
计算机视觉
领域(数学)
残余物
人工神经网络
算法
数学
考古
历史
哲学
语言学
几何学
纯数学
操作系统
作者
Shaobo Wang,Renhai Chen,Hongyue Wu,Xiaozhe Li,Zhiyong Feng
标识
DOI:10.1109/tip.2024.3374225
摘要
Multi-scale detection based on Feature Pyramid Networks (FPN) has been a popular approach in object detection to improve accuracy. However, using multi-layer features in the decoder of FPN methods entails performing many convolution operations on high-resolution feature maps, which consumes significant computational resources. In this paper, we propose a novel perspective for FPN in which we directly use fused single-layer features for regression and classification. Our proposed model, You Only Look One Hourglass (YOLOH), fuses multiple feature maps into one feature map in the encoder. We then use dense connections and dilated residual blocks to expand the receptive field of the fused feature map. This output not only contains information from all the feature maps, but also has a multi-scale receptive field for detection. The experimental results on the COCO dataset demonstrate that YOLOH achieves higher accuracy and better run-time performance than established detector baselines, for instance, it achieves an average precision (AP) of 50.2 on a standard 3× training schedule and achieves 40.3 AP at a speed of 32 FPS on the ResNet-50 model. We anticipate that YOLOH can serve as a reference for researchers to design real-time detection in future studies. Our code is available at https://github.com/wsb853529465/YOLOH-main.
科研通智能强力驱动
Strongly Powered by AbleSci AI