特征(语言学)
水准点(测量)
计算机科学
张量(固有定义)
奇异值分解
过程(计算)
秩(图论)
维数(图论)
人工智能
代表(政治)
机器学习
模式识别(心理学)
数学
哲学
语言学
大地测量学
纯数学
地理
组合数学
政治
政治学
法学
操作系统
作者
Majid Sepahvand,Fardin Abdali-Mohammadi,Amir Taherkordi
标识
DOI:10.1016/j.eswa.2022.117474
摘要
According to the recent studies on feature-based knowledge distillation (KD), a student model will not be able to imitate a teacher’s behavior properly if there is a high variance between the inner layers of the teacher and those of the student in terms of spatial shapes. This paper proposes a hypothesis that breaking down the knowledge of feature maps from a teacher’s inner layers and then distilling this knowledge into a student’s inner layers can bridge the gap between an advanced teacher and a student. Improving a student’s performance, this process can also help the student model better comprehend the knowledge. Hence, this paper embeds feature-based KD modules between a teacher model and a student model. In addition to extracting a tensor of feature maps in a teacher’s inner layers, these modules are responsible for breaking down this high-dimensional tensor through high-order rank singular value decomposition and then distilling the useful knowledge from the teacher’s feature maps into the student. According to various evaluations on two benchmark datasets in Experimental Results and Paired t-Test, adding the tensor decomposition approach to the feature-based KD module had a major role in enhancing the performance of a student model which showed competitive outputs in comparison with the state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI