透视图(图形)
蒸馏
计算机科学
人工智能
工艺工程
数据挖掘
生化工程
色谱法
化学
工程类
作者
Wei Li,Shitong Shao,Ziming Qiu,Aiguo Song
出处
期刊:Neurocomputing
[Elsevier BV]
日期:2024-05-01
卷期号:583: 127516-127516
标识
DOI:10.1016/j.neucom.2024.127516
摘要
Knowledge distillation stands as a capable technique for transferring knowledge from a larger to a smaller model, thereby notably enhancing the smaller model's performance. In the recent past, data augmentation has been employed in contrastive learning based knowledge distillation techniques yielding superior results. Despite the significant role of data augmentation, its value remains underappreciated within the domain of knowledge distillation, with no in-depth analysis in the literature thus far. To make up for this oversight, we conduct a multi-perspective theoretical and experimental analysis on the role that data augmentation can play in knowledge distillation. We summarize the properties of data augmentation and list the core findings as follows. (a) Our investigations validate that data augmentation significantly boosts the performance of knowledge distillation on the tasks of image classification and object detection. And this holds true even if the teacher model lacks comprehensive information about the augmented samples. Moreover, our novel Joint Data Augmentation (JDA) approach outperforms single data augmentation in knowledge distillation. (b) The pivotal role of data augmentation in knowledge distillation can be theoretically explained via Sharpness-Aware Minimization. (c) The compatibility of data augmentation with various knowledge distillation methods can enhance their performance. In light of these observations, we propose a new method called Cosine Confidence Distillation (CCD) for more reasonable knowledge transfer from augmented samples. Experimental results not only demonstrate that CCD becomes the state-of-the-art method with less storage requirement on CIFAR-100 and ImageNet-1k, but also validate the superiority of CCD over DIST on the object detection benchmark dataset, MS-COCO.
科研通智能强力驱动
Strongly Powered by AbleSci AI