Model Compression

342 papers with code • 2 benchmarks • 4 datasets

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Benchmarks

Add a Result

These leaderboards are used to track progress in Model Compression

Trend	Dataset	Best Model	Paper	Code	Compare
	ImageNet	ADLIK-MO-ResNet50+W4A4			See all
	QNLI	MobileBERT + 2bit-1dim model compression using DKM			See all

Libraries

Use these libraries to find Model Compression models and implementations

yoshitomo-matsubara/torchdistill

6 papers

1,258

UCMerced-ML/LC-model-compression

5 papers

yeshaokai/Robustness-Aware-Pruning-…

3 papers

NervanaSystems/distiller

2 papers

4,301

See all 7 libraries.

Datasets

Subtasks

Neural Network Compression

Most implemented papers

Most implemented Social Latest No code

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

DeepScale/SqueezeNet • • 24 Feb 2016

(2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car.

Paper
Code

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

google-research/bert • • ICLR 2020

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.

Paper
Code

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

mit-han-lab/amc • • ECCV 2018

Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets.

Paper
Code

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

ist-daslab/gptq • • 31 Oct 2022

In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient.

Paper
Code

Learned Step Size Quantization

zhutmost/lsq-net • • ICLR 2020

Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases.

Paper
Code

The State of Sparsity in Deep Neural Networks

ars-ashuha/variational-dropout-sparsifies-dnn • • 25 Feb 2019

We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.

Paper
Code

Ternary Weight Networks

fengfu-chris/caffe-twns • 16 May 2016

We present a memory and computation efficient ternary weight networks (TWNs) - with weights constrained to +1, 0 and -1.

Paper
Code

Model compression via distillation and quantization

antspy/quantized_distillation • • ICLR 2018

Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning.

Paper
Code

To prune, or not to prune: exploring the efficacy of pruning for model compression

intellabs/model-compression-research-package • • ICLR 2018

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model.

Paper
Code

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

DingXiaoH/GSM-SGD • • NeurIPS 2019

Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front-end devices.

Paper
Code

Model Compression

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result