对偶(语法数字)
计算机科学
蒸馏
人工智能
图像(数学)
机器学习
模式识别(心理学)
色谱法
化学
文学类
艺术
作者
Dongtong Ma,Kaibing Zhang,Qizhi Cao,Jie Li,Xinbo Gao
标识
DOI:10.1016/j.eswa.2024.123892
摘要
Knowledge distillation (KD) refers to transferring the knowledge learned from a teacher network with complex architecture and strong learning ability to another student network with light-weight and weak learning ability through a specific distillation strategy. However, most existing KD approaches to image classification often employ a single teacher network to guide the training of the student network. When the teacher network makes an erroneous prediction, the transferred knowledge will deteriorate the performance of the student network. To address or mitigate the above issue, we develop a novel KD approach called Coordinate Attention Guided Dual-Teacher Adaptive Knowledge Distillation (CAG-DAKD), to deliver more discriminative and comprehensive knowledge obtained from two teacher networks to a compact student network. Specifically, we integrate the positive prediction distribution of two teacher networks according to whether the two teacher networks predict correctly and the magnitude of the cross-entropy to deliver better output distribution to guide the student network. Furthermore, to distill the most valuable knowledge from the first teacher network that has a similar architecture to the student network, a coordinate attention mechanism is introduced into different layers of the first teacher network so that the student network can effectively learn more discriminative feature representations. We conduct extensive experiments on three standard image classification datasets: CIFAR10, CIFAR100, and ImageNet to verify the superiority of the proposed method over other state-of-the-art competitors. Code will be available at https://github.com/mdt1219/CAG-DAKD.git/
科研通智能强力驱动
Strongly Powered by AbleSci AI