Model Compression
342 papers with code • 2 benchmarks • 4 datasets
Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.
Libraries
Use these libraries to find Model Compression models and implementationsMost implemented papers
Training with Quantization Noise for Extreme Model Compression
A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Text to speech (TTS) has been broadly used to synthesize natural and intelligible speech in different scenarios.
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
The ability to act in multiple environments and transfer previous knowledge to new situations can be considered a critical aspect of any intelligent agent.
MicroExpNet: An Extremely Small and Fast Model For Expression Recognition From Face Images
On the other hand, KD is proved to be useful for model compression for the FER problem, and we discovered that its effects gets more and more significant with the decreasing model size.
Patient Knowledge Distillation for BERT Model Compression
Pre-trained language models such as BERT have proven to be highly effective for natural language processing (NLP) tasks.
Contrastive Representation Distillation
We demonstrate that this objective ignores important structural knowledge of the teacher network.
Data-Free Adversarial Distillation
Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer.
ZeroQ: A Novel Zero Shot Quantization Framework
Importantly, ZeroQ has a very low computational overhead, and it can finish the entire quantization process in less than 30s (0. 5\% of one epoch training time of ResNet50 on ImageNet).
Sharpness-aware Quantization for Deep Neural Networks
However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance.
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
As the training of giant dense models hits the boundary on the availability and capability of the hardware resources today, Mixture-of-Experts (MoE) models become one of the most promising model architectures due to their significant training cost reduction compared to a quality-equivalent dense model.