2 code implementations • 28 Feb 2021 • Aryan Asadian, Amirali Salehi-Abari
However, when there is a large difference between the model complexities of teacher and student (i. e., capacity gap), knowledge distillation loses its strength in transferring knowledge from the teacher to the student, thus training a weaker student.