计算机科学
特征(语言学)
蒸馏
人工智能
机器学习
代表(政治)
抓住
算法
模式识别(心理学)
政治
哲学
有机化学
化学
程序设计语言
法学
语言学
政治学
作者
Shengjie Cheng,Peiyong Zhou,YuLiu,HongjiMa,Alimjan Aysa,Kurban Ubul
标识
DOI:10.1016/j.eswa.2023.122553
摘要
The current most advanced CNN-based detection models are nearly not deployable on mobile devices with limited arithmetic power due to problems such as too many redundant parameters and excessive arithmetic power required, and knowledge distillation as a potentially practical model compression approach can alleviate this limitation. In the past, feature-based knowledge distillation algorithms focused more on transferring the local features customized by people and reduced the full grasp of global information in images. To address the shortcomings of traditional feature distillation algorithms, we first improve GAMAttention to learn the global feature representation in images, and the improved attention mechanism can minimize the information loss caused by processing features. Secondly, feature shifting no longer defines manually which features should be shifted, a more interpretable approach is proposed where the student network learns to emulate the high-response feature regions predicted by the teacher network, which increases the end-to-end properties of the model, and feature shifting allows the student network to simulate the teacher network in generating semantically strong feature maps to improve the detection performance of the small model. To avoid learning too many noisy features when learning background features, these two parts of feature distillation are assigned different weights. Finally, logical distillation is performed on the prediction heads of the student and teacher networks. In this experiment, we chose Yolov5 as the base network structure for teacher-student pairs. We improved Yolov5s through attention and knowledge distillation, ultimately achieving a 1.3% performance gain on VOC and a 1.8% performance gain on KITTI.
科研通智能强力驱动
Strongly Powered by AbleSci AI