Model Compression

342 papers with code • 2 benchmarks • 4 datasets

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Latest papers with no code

Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations

no code yet • 2 Apr 2024

Therefore model compression is important, to retain the performance of larger models, but with a reduced cost of running them.

Instance-Aware Group Quantization for Vision Transformers

no code yet • 1 Apr 2024

In particular, the distribution of activations for each channel vary drastically according to input instances, making PTQ methods for CNNs inappropriate for ViTs.

Dense Vision Transformer Compression with Few Samples

no code yet • 27 Mar 2024

In particular, the issue of sparse compression exists in traditional CNN few-shot methods, which can only produce very few compressed models of different model sizes.

Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation

no code yet • 27 Mar 2024

Moreover, we propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity.

Chain of Compression: A Systematic Approach to Combinationally Compress Convolutional Neural Networks

no code yet • 26 Mar 2024

Convolutional neural networks (CNNs) have achieved significant popularity, but their computational and memory intensity poses challenges for resource-constrained computing systems, particularly with the prerequisite of real-time performance.

Magic for the Age of Quantized DNNs

no code yet • 22 Mar 2024

Recently, the number of parameters in DNNs has explosively increased, as exemplified by LLMs (Large Language Models), making inference on small-scale computers more difficult.

Advancing IIoT with Over-the-Air Federated Learning: The Role of Iterative Magnitude Pruning

no code yet • 21 Mar 2024

Targeting the notion of compact yet robust DNN models, we propose the integration of iterative magnitude pruning (IMP) of the DNN model being trained in an over-the-air FL (OTA-FL) environment for IIoT.

DiPaCo: Distributed Path Composition

no code yet • 15 Mar 2024

Progress in machine learning (ML) has been fueled by scaling neural network models.

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

no code yet • 14 Mar 2024

Consequently, a simple combination of them cannot guarantee accomplishing both training efficiency and inference efficiency with minimal costs.

BRIEDGE: EEG-Adaptive Edge AI for Multi-Brain to Multi-Robot Interaction

no code yet • 14 Mar 2024

To better extract the joint features of heterogeneous EEG data as well as enhance classification accuracy, BRIEDGE introduces an informer-based ProbSparse self-attention mechanism.