蒸馏
特征(语言学)
计算机科学
人工智能
机器学习
知识转移
融合
模式识别(心理学)
数据挖掘
化学
知识管理
语言学
哲学
有机化学
作者
Zihan Wang,Junwei Xie,Zhiping Yao,Xu Kuang,Qinquan Gao,Tong Tong
标识
DOI:10.1109/bdai56143.2022.9862657
摘要
The aim of Knowledge Distillation (KD) is to train lightweight student models through extra supervision from large teacher models. Most previous KD methods transfer feature information from teacher models to student models via connections of feature maps at the same layers. This paper proposes a novel multi-level knowledge distillation method, referred to as Normalized Feature Fusion Knowledge distillation(NFFKD). The proposed model learns different levels of knowledge to improve the network performance. We proposed to use the hierarchical mixed loss(HML) module to minimize the gap between the intermediate feature layers of the teacher and the student, and the teacher-student gap is reduced by normalizing the logits. Experimental results have demonstrated that the proposed NFFKD shows superiority over several state-of-the-art KD methods on public datasets under different settings.
科研通智能强力驱动
Strongly Powered by AbleSci AI