Search Results for author: Alessandro Pappalardo

Found 7 papers, 3 papers with code

A2Q+: Improving Accumulator-Aware Weight Quantization

no code implementations • 19 Jan 2024 • Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig, Yaman Umuroglu

Recent studies show that also reducing the precision of the accumulator can further improve hardware efficiency at the risk of numerical overflow, which introduces arithmetic errors that can degrade model accuracy.

Quantization

Paper
Add Code

Post-Training Quantization with Low-precision Minifloats and Integers on FPGAs

no code implementations • 21 Nov 2023 • Shivam Aggarwal, Alessandro Pappalardo, Hans Jakob Damsgaard, Giuseppe Franco, Thomas B. Preußer, Michaela Blott, Tulika Mitra

However, the exploration of floating-point formats smaller than 8 bits and their comparison with integer quantization remains relatively limited.

Model Compression Quantization

Paper
Add Code

A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance

no code implementations • ICCV 2023 • Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig

We apply our method to deep learning-based computer vision tasks to show that A2Q can train QNNs for low-precision accumulators while maintaining model accuracy competitive with a floating-point baseline.

Quantization

Paper
Add Code

Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance

no code implementations • 31 Jan 2023 • Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig

Across all of our benchmark models trained with 8-bit weights and activations, we observe that constraining the hidden layers of quantized neural networks to fit into 16-bit accumulators yields an average 98. 2% sparsity with an estimated compression rate of 46. 5x all while maintaining 99. 2% of the floating-point performance.

Quantization

Paper
Add Code

QONNX: Representing Arbitrary-Precision Quantized Neural Networks

1 code implementation • 15 Jun 2022 • Alessandro Pappalardo, Yaman Umuroglu, Michaela Blott, Jovan Mitrevski, Ben Hawks, Nhan Tran, Vladimir Loncar, Sioni Summers, Hendrik Borras, Jules Muhizi, Matthew Trahms, Shih-Chieh Hsu, Scott Hauck, Javier Duarte

We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks.

Quantization

Paper
Code

Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference

1 code implementation • 22 Feb 2021 • Benjamin Hawks, Javier Duarte, Nicholas J. Fraser, Alessandro Pappalardo, Nhan Tran, Yaman Umuroglu

We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning, and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics.

Bayesian Optimization Computational Efficiency +2

Paper
Code

FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs

1 code implementation • 11 Jul 2018 • Vladimir Rybalkin, Alessandro Pappalardo, Muhammad Mohsin Ghaffar, Giulio Gambardella, Norbert Wehn, Michaela Blott

In this paper, we present the first systematic exploration of this design space as a function of precision for Bidirectional Long Short-Term Memory (BiLSTM) neural network.

Optical Character Recognition Optical Character Recognition (OCR) +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.