Model Compression

340 papers with code • 2 benchmarks • 4 datasets

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Libraries

Use these libraries to find Model Compression models and implementations

Latest papers with no code

Optimal Policy Sparsification and Low Rank Decomposition for Deep Reinforcement Learning

no code yet • 10 Mar 2024

The results suggest that our custom $L_0$-norm-regularization technique for sparsification of DRL policies is a promising avenue to reduce computational resources and limit overfitting.

Towards efficient deep autoencoders for multivariate time series anomaly detection

no code yet • 4 Mar 2024

First, pruning reduces the number of weights, while preventing catastrophic drops in accuracy by means of a fast search process that identifies high sparsity levels.

Differentially Private Knowledge Distillation via Synthetic Text Generation

no code yet • 1 Mar 2024

However, the increasing urgency of data privacy requires LLMs to train with Differential Privacy (DP) on private data.

FinGPT-HPC: Efficient Pretraining and Finetuning Large Language Models for Financial Applications with High-Performance Computing

no code yet • 21 Feb 2024

However, the resulting model still consumes a large amount of GPU memory.

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

no code yet • 20 Feb 2024

Recent advancements in large language models (LLMs) have raised concerns about inference costs, increasing the need for research into model compression.

From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges

no code yet • 20 Feb 2024

Generative Artificial Intelligence (AI) has shown tremendous prospects in all aspects of technology, including design.

Towards a tailored mixed-precision sub-8-bit quantization scheme for Gated Recurrent Units using Genetic Algorithms

no code yet • 19 Feb 2024

Despite the recent advances in model compression techniques for deep neural networks, deploying such models on ultra-low-power embedded devices still proves challenging.

Extraction of nonlinearity in neural networks and model compression with Koopman operator

no code yet • 18 Feb 2024

Nonlinearity plays a crucial role in deep neural networks.

Fast Vocabulary Transfer for Language Model Compression

no code yet • 15 Feb 2024

Real-world business applications require a trade-off between language model performance and size.

Model Compression and Efficient Inference for Large Language Models: A Survey

no code yet • 15 Feb 2024

However, Large language models have two prominent characteristics compared to smaller models: (1) Most of compression algorithms require finetuning or even retraining the model after compression.