期刊:IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers] 日期:2022-04-13卷期号:32 (9): 6324-6336被引量:16
标识
DOI:10.1109/tcsvt.2022.3167114
摘要
Object detection, as a fundamental problem in computer vision, has been widely used in many industrial applications, such as intelligent manufacturing and intelligent video surveillance. In this work, we find that classification and regression have different sensitivities to the object translation, from the investigation about the availability of highly overlapping proposals. More specifically, the regressor head has intrinsic characteristics of higher sensitivity to translation than the classifier. Based on it, we propose a decoupled sampling strategy for a deep detector, named Decoupled R-CNN, to decouple the proposals sampling for the two tasks, which induces two sensitivity-specific heads. Furthermore, we adopt the cascaded structure for the single regressor head of Decoupled R-CNN, which is an extremely simple but highly effective way of improving the performance of object detection. Extensive empirical analyses using real-world datasets demonstrate the value of the proposed method when compared with the state-of-the-art models. The reproducing code is available at https://github.com/shouwangzhe134/Decoupled-R-CNN .