Model Compression

342 papers with code • 2 benchmarks • 4 datasets

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning

hkuds/promptmm 27 Feb 2024

Additionally, to adjust the impact of inaccuracies in multimedia data, a disentangled multi-modal list-wise distillation is developed with modality-aware re-weighting mechanism.

26
27 Feb 2024

LLM Inference Unveiled: Survey and Roofline Model Insights

hahnyuan/llmviewer 26 Feb 2024

Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques.

162
26 Feb 2024

A Survey on Knowledge Distillation of Large Language Models

tebmer/awesome-knowledge-distillation-of-llms 20 Feb 2024

In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and Mistral.

234
20 Feb 2024

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

hatchetProject/QuEST 6 Feb 2024

Diffusion models have achieved remarkable success in image generation tasks, yet their practical deployment is restrained by the high memory and time consumption.

18
06 Feb 2024

The Potential of AutoML for Recommender Systems

isg-siegen/automl_for_recommender_systems 6 Feb 2024

We found that AutoML and AutoRecSys libraries performed best.

0
06 Feb 2024

Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward

nyunai/faster-llm-survey 2 Feb 2024

Despite the impressive performance of LLMs, their widespread adoption faces challenges due to substantial computational and memory requirements during inference.

29
02 Feb 2024

LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection

stiphyjay/lidar-ptq 29 Jan 2024

To our knowledge, for the very first time in lidar-based 3D detection tasks, the PTQ INT8 model's accuracy is almost the same as the FP32 model while enjoying $3\times$ inference speedup.

52
29 Jan 2024

TQCompressor: improving tensor decomposition methods in neural networks via permutations

terra-quantum-public/tqcompressedgpt2 29 Jan 2024

The result of the compression is TQCompressedGPT-2 model, featuring 81 mln.

5
29 Jan 2024

Communication-Efficient Federated Learning through Adaptive Weight Clustering and Server-Side Distillation

FederatedML/FedCompress 25 Jan 2024

Federated Learning (FL) is a promising technique for the collaborative training of deep neural networks across multiple devices while preserving data privacy.

4
25 Jan 2024

Model Compression Techniques in Biometrics Applications: A Survey

eduardacaldeira/compression_bias_survey 18 Jan 2024

The development of deep learning algorithms has extensively empowered humanity's task automatization capacity.

3
18 Jan 2024