期刊:IEEE Transactions on Instrumentation and Measurement [Institute of Electrical and Electronics Engineers] 日期:2023-01-01卷期号:72: 1-17被引量:4
标识
DOI:10.1109/tim.2023.3330184
摘要
Prohibited items detection in X-ray images is an essential part of security inspection in various scenarios. Currently this work is mainly completed by expensive labor, as the detection accuracy of the automatic recognition algorithms needs to be further improved. In this paper, we propose the novel ideologies of whole-process feature fusion and local-global semantic dependency interaction to improve the automatic detection of prohibited items. Specifically, for whole-process feature fusion, we design Coordinated Fusion Backbone (CFB), Adaptively Refined Feature Pyramid (ARFP) and Selective Dense Feature Interaction (SDFI). For local-global semantic dependency interaction, we develop the ConvFormer (CF). CFB extracts and fuses the multi-scale pyramidal features of input image in parallel, bridging the gap between high-level semantic context features and low-level spatial-aware features. Then the local-global long-range dependencies are further modeled through CF to construct the initial fusion features that are fed into ARFP to be adaptively fused and refined to alleviate the semantic misalignment and information loss of pyramidal features, so as to produce the refined features focusing on identifiable regions. Finally, the SDFI performs in-depth fusion interactions on the refined features to obtain optimal feature representation. The above improvements are integrated into our model family PIXDet (T/S/L). Extensive experiments are conducted on the challenging SIXray, OPIXray, CLCXray and PIDray datasets, and experimental results show that our PIXDet detector family achieves remarkable detection results (91.2% mAP, 91.6% mAP, 60.4% AP and 80.0% AP by PIXDet-S) on these benchmarks with favorable competitiveness compared with other SOTA methods.