Model Compression
342 papers with code • 2 benchmarks • 4 datasets
Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.
Libraries
Use these libraries to find Model Compression models and implementationsLatest papers
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
In the context of efficient OVS, we target achieving performance that is comparable to or even better than prior OVS works based on large vision-language foundation models, by utilizing smaller models that incur lower training costs.
Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind
MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets.
Are Compressed Language Models Less Subgroup Robust?
To reduce the inference cost of large language models, model compression is increasingly used to create smaller scalable models.
Tiny Models are the Computational Saver for Large Models
By searching and employing the most appropriate tiny model as the computational saver for a given large model, the proposed approaches work as a novel and generic method to model compression.
Adversarial Fine-tuning of Compressed Neural Networks for Joint Improvement of Robustness and Efficiency
We present experiments on two benchmark datasets showing that adversarial fine-tuning of compressed models can achieve robustness performance comparable to adversarially trained models, while also improving computational efficiency.
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
The advancements in Large Language Models (LLMs) have been hindered by their substantial sizes, which necessitate LLM compression methods for practical deployment.
Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic Hashing
In this paper, we propose an innovative Bit-mask Robust Contrastive knowledge Distillation (BRCD) method, specifically devised for the distillation of semantic hashing models.
DyCE: Dynamic Configurable Exiting for Deep Learning Compression and Scaling
Moreover, most current dynamic compression designs are monolithic and tightly integrated with base models, thereby complicating the adaptation to novel base models.
"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach
Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging.
PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning
Additionally, to adjust the impact of inaccuracies in multimedia data, a disentangled multi-modal list-wise distillation is developed with modality-aware re-weighting mechanism.