TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Knowledge Distillation	ImageNet	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.3	# 2
Knowledge Distillation	ImageNet	VkD (T:RegNety 160 S:DeiT-S)	model size	22M	# 8
Knowledge Distillation	ImageNet	VkD (T:RegNety 160 S:DeiT-S)	CRD training setting	✘	# 1
Knowledge Distillation	ImageNet	VkD (T:RegNety 160 S:DeiT-Ti)	Top-1 accuracy %	79.2	# 5
Knowledge Distillation	ImageNet	VkD (T:RegNety 160 S:DeiT-Ti)	model size	6M	# 11
Knowledge Distillation	ImageNet	VkD (T:RegNety 160 S:DeiT-Ti)	CRD training setting	✘	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/v-kd-improving-knowledge-distillation-using/knowledge-distillation-on-imagenet)](https://paperswithcode.com/sota/knowledge-distillation-on-imagenet?p=v-kd-improving-knowledge-distillation-using)`

$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections

10 Mar 2024 · Roy Miles, Ismail Elezi, Jiankang Deng ·

Knowledge distillation is an effective method for training small and efficient deep learning models. However, the efficacy of a single method can degenerate when transferring to other tasks, modalities, or even other architectures. To address this limitation, we propose a novel constrained feature distillation method. This method is derived from a small set of core principles, which results in two emerging components: an orthogonal projection and a task-specific normalisation. Equipped with both of these components, our transformer models can outperform all previous methods on ImageNet and reach up to a 4.4% relative improvement over the previous state-of-the-art methods. To further demonstrate the generality of our method, we apply it to object detection and image generation, whereby we obtain consistent and substantial performance improvements over state-of-the-art. Code and models are publicly available: https://github.com/roymiles/vkd

PDF Abstract

Code

Add Remove Mark official

roymiles/vkd official

Tasks

Add Remove

Image Generation

Knowledge Distillation

object-detection

Object Detection

Datasets

CIFAR-10

ImageNet

MS COCO

CIFAR-100 ImageNet-1K

Results from the Paper

Add Remove

Ranked #2 on Knowledge Distillation on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Knowledge Distillation	ImageNet	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.3	# 2	Compare
			model size	22M	# 8	Compare
			CRD training setting	✘	# 1	Compare
Knowledge Distillation	ImageNet	VkD (T:RegNety 160 S:DeiT-Ti)	Top-1 accuracy %	79.2	# 5	Compare
			model size	6M	# 11	Compare
			CRD training setting	✘	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove