Quantization

1046 papers with code • 10 benchmarks • 18 datasets

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Libraries

Use these libraries to find Quantization models and implementations

Latest papers with no code

Latency-Distortion Tradeoffs in Communicating Classification Results over Noisy Channels

no code yet • 22 Apr 2024

Our results show that there is an interesting interplay between source distortion (i. e., distortion for the probability vector measured via f-divergence) and the subsequent channel encoding/decoding parameters; and indicate that a joint design of these parameters is crucial to navigate the latency-distortion tradeoff.

AdaQAT: Adaptive Bit-Width Quantization-Aware Training

no code yet • 22 Apr 2024

Compared to other methods that are generally designed to be run on a pretrained network, AdaQAT works well in both training from scratch and fine-tuning scenarios. Initial results on the CIFAR-10 and ImageNet datasets using ResNet20 and ResNet18 models, respectively, indicate that our method is competitive with other state-of-the-art mixed-precision quantization approaches.

FedMPQ: Secure and Communication-Efficient Federated Learning with Multi-codebook Product Quantization

no code yet • 21 Apr 2024

In federated learning, particularly in cross-device scenarios, secure aggregation has recently gained popularity as it effectively defends against inference attacks by malicious aggregators.

HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression

no code yet • 20 Apr 2024

This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates.

EdgeFusion: On-Device Text-to-Image Generation

no code yet • 18 Apr 2024

The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application.

Privacy-Preserving UCB Decision Process Verification via zk-SNARKs

no code yet • 18 Apr 2024

With the increasingly widespread application of machine learning, how to strike a balance between protecting the privacy of data and algorithm parameters and ensuring the verifiability of machine learning has always been a challenge.

LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory

no code yet • 17 Apr 2024

Transformer models have been successful in various sequence processing tasks, but the self-attention mechanism's computational cost limits its practicality for long sequences.

Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems

no code yet • 17 Apr 2024

Simulating the dynamics of open quantum systems coupled to non-Markovian environments remains an outstanding challenge due to exponentially scaling computational costs.

QGen: On the Ability to Generalize in Quantization Aware Training

no code yet • 17 Apr 2024

In this work, we investigate the generalization properties of quantized neural networks, a characteristic that has received little attention despite its implications on model performance.

Comprehensive Survey of Model Compression and Speed up for Vision Transformers

no code yet • 16 Apr 2024

Vision Transformers (ViT) have marked a paradigm shift in computer vision, outperforming state-of-the-art models across diverse tasks.