计算机科学
人工智能
目标检测
计算机视觉
对象(语法)
Viola–Jones对象检测框架
模式识别(心理学)
视觉对象识别的认知神经科学
卷积神经网络
作者
Zili Liu,Tu Zheng,Guodong Xu,Zheng Yang,Haifeng Liu,Deng Cai
标识
DOI:10.1016/j.neucom.2020.12.055
摘要
Abstract Modern object detectors rarely achieve short training time, fast inference speed, and high accuracy at the same time. To strike a balance among them, we propose single-scale TTFNet and multi-scale TTFNeXt. In this work, we use light-head, single-stage, and anchor-free designs, which enable fast inference speed. Then, we focus on reducing training time and improving accuracy. We notice that encoding more training samples from annotated boxes plays a similar role as increasing batch size, which helps enlarge the learning rate and accelerate the training process. To this end, we introduce a dense regression approach based on Gaussian kernels. We also show through experiments that deformable convolutions in our single-scale detector are not sufficient to handle the scale-variation problem. Therefore, we extend the single-scale detector to a multi-scale version. The multi-scale design will yield redundant detections from different pyramid levels, thus we introduce our cross-level NMS algorithm to efficiently eliminate redundant results. Experiments on MS COCO show that our TTFNet and TTFNeXt have great advantages in balancing training time, inference speed, and accuracy. They can reduce training time by more than three times compared to previous real-time detectors under similar detection accuracy and faster inference speed. When training 120 epochs, our TTFNeXt is able to achieve 33.7 AP/99 FPS and 41.8 AP/40 FPS with single GTX 1080Ti.
科研通智能强力驱动
Strongly Powered by AbleSci AI