Quantization

1039 papers with code • 10 benchmarks • 18 datasets

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Libraries

Use these libraries to find Quantization models and implementations

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

macaronlin/llama3-quantization 22 Apr 2024

This exploration holds the potential to unveil new insights and challenges for low-bit quantization of LLaMA3 and other forthcoming LLMs, especially in addressing performance degradation problems that suffer in LLM compression.

53
22 Apr 2024

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts

TUDB-Labs/MixLoRA 22 Apr 2024

Unlike other LoRA based MoE methods, MixLoRA enhances model performance by utilizing independently configurable attention-layer LoRA adapters, supporting the use of LoRA and its variants for the construction of experts, and applying auxiliary load balance loss to address the imbalance problem of the router.

4
22 Apr 2024

MAexp: A Generic Platform for RL-based Multi-Agent Exploration

duangzhu/maexp 19 Apr 2024

The sim-to-real gap poses a significant challenge in RL-based multi-agent exploration due to scene quantization and action discretization.

35
19 Apr 2024

decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points

bytedance/decoupleq 19 Apr 2024

However, existing quantization schemes suffer from significant accuracy degradation at very low bits, or require some additional computational overhead when deployed, making it difficult to be applied to large-scale applications in industry.

9
19 Apr 2024

Variational quantization for state space models

etidav/next 17 Apr 2024

The main challenge is to model a rich variety of time series, leverage any available external signals and provide sharp predictions with statistical guarantees.

1
17 Apr 2024

Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators

faceonlive/ai-research 8 Apr 2024

Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i. e., data types and bit-widths) and mapping (i. e., placement and scheduling of DNN elementary operations on hardware units of the accelerator).

156
08 Apr 2024

David and Goliath: An Empirical Evaluation of Attacks and Defenses for QNNs at the Deep Edge

faceonlive/ai-research 8 Apr 2024

To fill this gap, we empirically evaluate the effectiveness of attacks and defenses from (full-precision) ANNs on (constrained) QNNs.

156
08 Apr 2024

BinaryDM: Towards Accurate Binarization of Diffusion Model

xingyu-zheng/binarydm 8 Apr 2024

With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs.

4
08 Apr 2024

Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging

thuccslab/mergeguard 8 Apr 2024

Model merging is a promising lightweight model empowerment technique that does not rely on expensive computing devices (e. g., GPUs) or require the collection of specific training data.

1
08 Apr 2024