蒸馏
变压器
量化(信号处理)
计算机科学
人工智能
电气工程
计算机视觉
工程类
化学
色谱法
电压
作者
Navin Ranjan,Andreas Savakis
摘要
Vision Transformers (ViTs) have demonstrated remarkable performance in various visual tasks, but they suffer from expensive computational and memory challenges, which hinder their practical application in the real world. Model quantization methods reduce the model computation and memory requirements through low-bit representations. Knowledge distillation is used to guide the quantized student network to imitate the performance of its full-precision counterpart teacher network. However, for ultra-low bit quantization, the student networks experience a noticeable performance drop. This is primarily due to the limited learning capacity of the smaller network to capture the knowledge of the full-precision teacher, especially when the representation gaps between the student and the teacher networks are significant. In this paper, we introduce a multi-step knowledge distillation approach, utilizing intermediate-quantized networks with varying bit precision. This multi-step knowledge distillation approach enables an ultra-low bit quantized student network to effectively bridge the gap with the teacher network by gradually reducing the model's bit representation. We progressively teach each TA network to learn by distilling the knowledge from higher-bit quantized teacher networks from the previous step. The target student network learns from the combined knowledge of the teacher assistants and the full-precision teacher network, resulting in improved learning capacity even when faced with significant knowledge gaps. We evaluate our methods using the DeiT vision transformer for both ground level and aerial image classification tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI