TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Classification with Binary Weight Network	CIFAR-10	ResNet-18	Top-1	94.1	# 4
Knowledge Distillation	CIFAR-100	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	# 7
Knowledge Distillation	CIFAR-100	resnet110 (T:resnet110 S:resnet20)	Top-1 Accuracy (%)	71.99	# 21
Knowledge Distillation	CIFAR-100	vgg8 (T:vgg13 S:vgg8)	Top-1 Accuracy (%)	74.93	# 14
Knowledge Distillation	ImageNet	ITRD (T: ResNet-34 S:ResNet-18)	Top-1 accuracy %	71.68	# 30
Knowledge Distillation	ImageNet	ITRD (T: ResNet-34 S:ResNet-18)	model size	11.69M	# 10
Knowledge Distillation	ImageNet	ITRD (T: ResNet-34 S:ResNet-18)	CRD training setting	✓	# 1
Question Answering	SQuAD1.1	BERT - 3 Layers	EM	77.7	# 97
Question Answering	SQuAD1.1	BERT - 3 Layers	F1	85.8	# 89
Question Answering	SQuAD1.1	BERT - 6 Layers	EM	81.5	# 50
Question Answering	SQuAD1.1	BERT - 6 Layers	F1	88.5	# 52

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/information-theoretic-representation/classification-with-binary-weight-network-on)](https://paperswithcode.com/sota/classification-with-binary-weight-network-on?p=information-theoretic-representation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/information-theoretic-representation/knowledge-distillation-on-cifar-100)](https://paperswithcode.com/sota/knowledge-distillation-on-cifar-100?p=information-theoretic-representation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/information-theoretic-representation/knowledge-distillation-on-imagenet)](https://paperswithcode.com/sota/knowledge-distillation-on-imagenet?p=information-theoretic-representation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/information-theoretic-representation/question-answering-on-squad11)](https://paperswithcode.com/sota/question-answering-on-squad11?p=information-theoretic-representation)`

Information Theoretic Representation Distillation

1 Dec 2021 · Roy Miles, Adrian Lopez Rodriguez, Krystian Mikolajczyk ·

Despite the empirical success of knowledge distillation, current state-of-the-art methods are computationally expensive to train, which makes them difficult to adopt in practice. To address this problem, we introduce two distinct complementary losses inspired by a cheap entropy-like estimator. These losses aim to maximise the correlation and mutual information between the student and teacher representations. Our method incurs significantly less training overheads than other approaches and achieves competitive performance to the state-of-the-art on the knowledge distillation and cross-model transfer tasks. We further demonstrate the effectiveness of our method on a binary distillation task, whereby it leads to a new state-of-the-art for binary quantisation and approaches the performance of a full precision model. Code: www.github.com/roymiles/ITRD

PDF Abstract

Code

Add Remove Mark official

roymiles/ITRD official

Tasks

Add Remove

Classification with Binary Weight Network

Knowledge Distillation

Question Answering

Datasets

CIFAR-10

ImageNet

CIFAR-100

SQuAD

STL-10

Tiny ImageNet

Results from the Paper

Edit

Ranked #4 on Classification with Binary Weight Network on CIFAR-10

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Classification with Binary Weight Network	CIFAR-10	ResNet-18	Top-1	94.1	# 4	Compare
Knowledge Distillation	CIFAR-100	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	# 7	Compare
Knowledge Distillation	CIFAR-100	resnet110 (T:resnet110 S:resnet20)	Top-1 Accuracy (%)	71.99	# 21	Compare
Knowledge Distillation	CIFAR-100	vgg8 (T:vgg13 S:vgg8)	Top-1 Accuracy (%)	74.93	# 14	Compare
Knowledge Distillation	ImageNet	ITRD (T: ResNet-34 S:ResNet-18)	Top-1 accuracy %	71.68	# 30	Compare
			model size	11.69M	# 10	Compare
			CRD training setting	✓	# 1	Compare
Question Answering	SQuAD1.1	BERT - 3 Layers	EM	77.7	# 97	Compare
Question Answering	SQuAD1.1	BERT - 3 Layers	F1	85.8	# 89	Compare
Question Answering	SQuAD1.1	BERT - 6 Layers	EM	81.5	# 50	Compare
Question Answering	SQuAD1.1	BERT - 6 Layers	F1	88.5	# 52	Compare

Methods

Add Remove

Knowledge Distillation

Edit Social Preview

Information Theoretic Representation Distillation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove