Model Compression
342 papers with code • 2 benchmarks • 4 datasets
Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.
Libraries
Use these libraries to find Model Compression models and implementationsLatest papers with no code
Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations
Therefore model compression is important, to retain the performance of larger models, but with a reduced cost of running them.
Instance-Aware Group Quantization for Vision Transformers
In particular, the distribution of activations for each channel vary drastically according to input instances, making PTQ methods for CNNs inappropriate for ViTs.
Dense Vision Transformer Compression with Few Samples
In particular, the issue of sparse compression exists in traditional CNN few-shot methods, which can only produce very few compressed models of different model sizes.
Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation
Moreover, we propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity.
Chain of Compression: A Systematic Approach to Combinationally Compress Convolutional Neural Networks
Convolutional neural networks (CNNs) have achieved significant popularity, but their computational and memory intensity poses challenges for resource-constrained computing systems, particularly with the prerequisite of real-time performance.
Magic for the Age of Quantized DNNs
Recently, the number of parameters in DNNs has explosively increased, as exemplified by LLMs (Large Language Models), making inference on small-scale computers more difficult.
Advancing IIoT with Over-the-Air Federated Learning: The Role of Iterative Magnitude Pruning
Targeting the notion of compact yet robust DNN models, we propose the integration of iterative magnitude pruning (IMP) of the DNN model being trained in an over-the-air FL (OTA-FL) environment for IIoT.
DiPaCo: Distributed Path Composition
Progress in machine learning (ML) has been fueled by scaling neural network models.
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Consequently, a simple combination of them cannot guarantee accomplishing both training efficiency and inference efficiency with minimal costs.
BRIEDGE: EEG-Adaptive Edge AI for Multi-Brain to Multi-Robot Interaction
To better extract the joint features of heterogeneous EEG data as well as enhance classification accuracy, BRIEDGE introduces an informer-based ProbSparse self-attention mechanism.