Search Results for author: Markus Nagel

Found 25 papers, 8 papers with code

GPTVQ: The Blessing of Dimensionality for LLM Quantization

no code implementations • 23 Feb 2024 • Mart van Baalen, Andrey Kuzmin, Markus Nagel, Peter Couperus, Cedric Bastoul, Eric Mahurin, Tijmen Blankevoort, Paul Whatmough

In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality.

Quantization

Paper
Add Code

The LLM Surgeon

1 code implementation • 28 Dec 2023 • Tycho F. A. van der Ouderaa, Markus Nagel, Mart van Baalen, Yuki M. Asano, Tijmen Blankevoort

Experimentally, our method can prune rows and columns from a range of OPT models and Llamav2-7B by 20%-30%, with a negligible loss in performance, and achieve state-of-the-art results in unstructured and semi-structured pruning of large language models.

Paper
Code

MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

no code implementations • 2 Oct 2023 • Ties van Rozendaal, Tushar Singhal, Hoang Le, Guillaume Sautiere, Amir Said, Krishna Buska, Anjuman Raha, Dimitris Kalatzis, Hitarth Mehta, Frank Mayer, Liang Zhang, Markus Nagel, Auke Wiggers

This work presents the first neural video codec that decodes 1080p YUV420 video in real time on a mobile device.

Motion Compensation Quantization +1

Paper
Add Code

Softmax Bias Correction for Quantized Generative Models

no code implementations • 4 Sep 2023 • Nilesh Prasad Pandey, Marios Fournarakis, Chirag Patel, Markus Nagel

Post-training quantization (PTQ) is the go-to compression technique for large generative models, such as stable diffusion or large language models.

Language Modelling Quantization

Paper
Add Code

ResQ: Residual Quantization for Video Perception

no code implementations • ICCV 2023 • Davide Abati, Haitam Ben Yahia, Markus Nagel, Amirhossein Habibian

Furthermore, we extend our model to dynamically adjust the bit-width proportional to the amount of changes in the video.

Optical Flow Estimation Pose Estimation +3

Paper
Add Code

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

no code implementations • 10 Jul 2023 • Jorn Peters, Marios Fournarakis, Markus Nagel, Mart van Baalen, Tijmen Blankevoort

By combining fast-to-compute sensitivities with efficient solvers during QAT, QBitOpt can produce mixed-precision networks with high task performance guaranteed to satisfy strict resource constraints.

Quantization

Paper
Add Code

Pruning vs Quantization: Which is Better?

1 code implementation • NeurIPS 2023 • Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort

We provide an extensive comparison between the two techniques for compressing deep neural networks.

Network Pruning Quantization

Paper
Code

FP8 versus INT8 for efficient deep learning inference

no code implementations • 31 Mar 2023 • Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric Mahurin, Chirag Patel, Sundar Subramanian, Sanghyuk Lee, Markus Nagel, Joseph Soriaga, Tijmen Blankevoort

We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice.

Quantization

Paper
Add Code

A Practical Mixed Precision Algorithm for Post-Training Quantization

no code implementations • 10 Feb 2023 • Nilesh Prasad Pandey, Markus Nagel, Mart van Baalen, Yin Huang, Chirag Patel, Tijmen Blankevoort

We experimentally validate our proposed method on several computer vision tasks, natural language processing tasks and many different networks, and show that we can find mixed precision networks that provide a better trade-off between accuracy and efficiency than their homogeneous bit-width equivalents.

Quantization

Paper
Add Code

Quadapter: Adapter for GPT-2 Quantization

no code implementations • 30 Nov 2022 • Minseop Park, Jaeseong You, Markus Nagel, Simyung Chang

In that case, it is observed that quantization-aware training overfits the model to the fine-tuning data.

Quantization

Paper
Add Code

FP8 Quantization: The Power of the Exponent

1 code implementation • 19 Aug 2022 • Andrey Kuzmin, Mart van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters, Tijmen Blankevoort

We detail the choices that can be made for the FP8 format, including the important choice of the number of bits for the mantissa and exponent, and show analytically in which settings these choices give better performance.

Quantization

Paper
Code

Quantized Sparse Weight Decomposition for Neural Network Compression

no code implementations • 22 Jul 2022 • Andrey Kuzmin, Mart van Baalen, Markus Nagel, Arash Behboodi

In this paper, we introduce a novel method of neural network weight compression.

Neural Network Compression Quantization

Paper
Add Code

Quantization Robust Federated Learning for Efficient Inference on Heterogeneous Devices

no code implementations • 22 Jun 2022 • Kartik Gupta, Marios Fournarakis, Matthias Reisser, Christos Louizos, Markus Nagel

We perform extensive experiments on standard FL benchmarks to evaluate our proposed FedAvg variants for quantization robustness and provide a convergence analysis for our Quantization-Aware variants in FL.

BIG-bench Machine Learning Federated Learning +1

Paper
Add Code

Overcoming Oscillations in Quantization-Aware Training

1 code implementation • 21 Mar 2022 • Markus Nagel, Marios Fournarakis, Yelysei Bondarenko, Tijmen Blankevoort

These effects are particularly pronounced in low-bit ($\leq$ 4-bits) quantization of efficient networks with depth-wise separable layers, such as MobileNets and EfficientNets.

Quantization

Paper
Code

Cyclical Pruning for Sparse Neural Networks

no code implementations • 2 Feb 2022 • Suraj Srinivas, Andrey Kuzmin, Markus Nagel, Mart van Baalen, Andrii Skliar, Tijmen Blankevoort

Current methods for pruning neural network weights iteratively apply magnitude-based pruning on the model weights and re-train the resulting model to recover lost accuracy.

Paper
Add Code

Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)

no code implementations • 20 Jan 2022 • Sangeetha Siddegowda, Marios Fournarakis, Markus Nagel, Tijmen Blankevoort, Chirag Patel, Abhijit Khobare

chapter 4) and quantization-aware training (QAT, cf.

Model Optimization Quantization

Paper
Add Code

Quantized sparse PCA for neural network weight compression

no code implementations • 29 Sep 2021 • Andrey Kuzmin, Mart van Baalen, Markus Nagel, Arash Behboodi

In this paper, we introduce a novel method of weight compression.

Quantization

Paper
Add Code

Understanding and Overcoming the Challenges of Efficient Transformer Quantization

1 code implementation • EMNLP 2021 • Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort

Finally, we show that transformer weights and embeddings can be quantized to ultra-low bit-widths, leading to significant memory savings with a minimum accuracy loss.

Quantization

166

Paper
Code

A White Paper on Neural Network Quantization

no code implementations • 15 Jun 2021 • Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart van Baalen, Tijmen Blankevoort

Neural network quantization is one of the most effective ways of achieving these savings but the additional noise it induces can lead to accuracy degradation.

Quantization

Paper
Add Code

In-Hindsight Quantization Range Estimation for Quantized Training

no code implementations • 10 May 2021 • Marios Fournarakis, Markus Nagel

Quantization techniques applied to the inference of deep neural networks have enabled fast and efficient execution on resource-constraint devices.

Image Classification Quantization

Paper
Add Code

Bayesian Bits: Unifying Quantization and Pruning

1 code implementation • NeurIPS 2020 • Mart van Baalen, Christos Louizos, Markus Nagel, Rana Ali Amjad, Ying Wang, Tijmen Blankevoort, Max Welling

We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization.

Quantization

Paper
Code

Up or Down? Adaptive Rounding for Post-Training Quantization

no code implementations • ICML 2020 • Markus Nagel, Rana Ali Amjad, Mart van Baalen, Christos Louizos, Tijmen Blankevoort

In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss.

Quantization

Paper
Add Code

LSQ+: Improving low-bit quantization through learnable offsets and better initialization

4 code implementations • 20 Apr 2020 • Yash Bhalgat, Jinwon Lee, Markus Nagel, Tijmen Blankevoort, Nojun Kwak

To solve this problem, we propose LSQ+, a natural extension of LSQ, wherein we introduce a general asymmetric quantization scheme with trainable scale and offset parameters that can learn to accommodate the negative activations.

Ranked #18 on Quantization on ImageNet