Model Compression
340 papers with code • 2 benchmarks • 4 datasets
Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.
Libraries
Use these libraries to find Model Compression models and implementationsLatest papers with no code
Optimal Policy Sparsification and Low Rank Decomposition for Deep Reinforcement Learning
The results suggest that our custom $L_0$-norm-regularization technique for sparsification of DRL policies is a promising avenue to reduce computational resources and limit overfitting.
Towards efficient deep autoencoders for multivariate time series anomaly detection
First, pruning reduces the number of weights, while preventing catastrophic drops in accuracy by means of a fast search process that identifies high sparsity levels.
Differentially Private Knowledge Distillation via Synthetic Text Generation
However, the increasing urgency of data privacy requires LLMs to train with Differential Privacy (DP) on private data.
FinGPT-HPC: Efficient Pretraining and Finetuning Large Language Models for Financial Applications with High-Performance Computing
However, the resulting model still consumes a large amount of GPU memory.
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning
Recent advancements in large language models (LLMs) have raised concerns about inference costs, increasing the need for research into model compression.
From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges
Generative Artificial Intelligence (AI) has shown tremendous prospects in all aspects of technology, including design.
Towards a tailored mixed-precision sub-8-bit quantization scheme for Gated Recurrent Units using Genetic Algorithms
Despite the recent advances in model compression techniques for deep neural networks, deploying such models on ultra-low-power embedded devices still proves challenging.
Extraction of nonlinearity in neural networks and model compression with Koopman operator
Nonlinearity plays a crucial role in deep neural networks.
Fast Vocabulary Transfer for Language Model Compression
Real-world business applications require a trade-off between language model performance and size.
Model Compression and Efficient Inference for Large Language Models: A Survey
However, Large language models have two prominent characteristics compared to smaller models: (1) Most of compression algorithms require finetuning or even retraining the model after compression.