Model Compression

342 papers with code • 2 benchmarks • 4 datasets

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Latest papers with no code

Enhanced Sparsification via Stimulative Training

no code yet • 11 Mar 2024

To alleviate this issue, we first study and reveal the relative sparsity effect in emerging stimulative training and then propose a structured pruning framework, named STP, based on an enhanced sparsification paradigm which maintains the magnitude of dropped weights and enhances the expressivity of kept weights by self-distillation.

Optimal Policy Sparsification and Low Rank Decomposition for Deep Reinforcement Learning

no code yet • 10 Mar 2024

The results suggest that our custom $L_0$-norm-regularization technique for sparsification of DRL policies is a promising avenue to reduce computational resources and limit overfitting.

Towards efficient deep autoencoders for multivariate time series anomaly detection

no code yet • 4 Mar 2024

First, pruning reduces the number of weights, while preventing catastrophic drops in accuracy by means of a fast search process that identifies high sparsity levels.

Differentially Private Knowledge Distillation via Synthetic Text Generation

no code yet • 1 Mar 2024

However, the increasing urgency of data privacy requires LLMs to train with Differential Privacy (DP) on private data.

FinGPT-HPC: Efficient Pretraining and Finetuning Large Language Models for Financial Applications with High-Performance Computing

no code yet • 21 Feb 2024

However, the resulting model still consumes a large amount of GPU memory.

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

no code yet • 20 Feb 2024

Recent advancements in large language models (LLMs) have raised concerns about inference costs, increasing the need for research into model compression.

From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges

no code yet • 20 Feb 2024

Generative Artificial Intelligence (AI) has shown tremendous prospects in all aspects of technology, including design.

Towards a tailored mixed-precision sub-8-bit quantization scheme for Gated Recurrent Units using Genetic Algorithms

no code yet • 19 Feb 2024

Despite the recent advances in model compression techniques for deep neural networks, deploying such models on ultra-low-power embedded devices still proves challenging.

Extraction of nonlinearity in neural networks and model compression with Koopman operator

no code yet • 18 Feb 2024

Nonlinearity plays a crucial role in deep neural networks.

Fast Vocabulary Transfer for Language Model Compression

no code yet • 15 Feb 2024

Real-world business applications require a trade-off between language model performance and size.