Search Results for author: Alessandro Pappalardo

Found 7 papers, 3 papers with code

A2Q+: Improving Accumulator-Aware Weight Quantization

no code implementations19 Jan 2024 Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig, Yaman Umuroglu

Recent studies show that also reducing the precision of the accumulator can further improve hardware efficiency at the risk of numerical overflow, which introduces arithmetic errors that can degrade model accuracy.

Quantization

Post-Training Quantization with Low-precision Minifloats and Integers on FPGAs

no code implementations21 Nov 2023 Shivam Aggarwal, Alessandro Pappalardo, Hans Jakob Damsgaard, Giuseppe Franco, Thomas B. Preußer, Michaela Blott, Tulika Mitra

However, the exploration of floating-point formats smaller than 8 bits and their comparison with integer quantization remains relatively limited.

Model Compression Quantization

A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance

no code implementations ICCV 2023 Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig

We apply our method to deep learning-based computer vision tasks to show that A2Q can train QNNs for low-precision accumulators while maintaining model accuracy competitive with a floating-point baseline.

Quantization

Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance

no code implementations31 Jan 2023 Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig

Across all of our benchmark models trained with 8-bit weights and activations, we observe that constraining the hidden layers of quantized neural networks to fit into 16-bit accumulators yields an average 98. 2% sparsity with an estimated compression rate of 46. 5x all while maintaining 99. 2% of the floating-point performance.

Quantization

QONNX: Representing Arbitrary-Precision Quantized Neural Networks

1 code implementation15 Jun 2022 Alessandro Pappalardo, Yaman Umuroglu, Michaela Blott, Jovan Mitrevski, Ben Hawks, Nhan Tran, Vladimir Loncar, Sioni Summers, Hendrik Borras, Jules Muhizi, Matthew Trahms, Shih-Chieh Hsu, Scott Hauck, Javier Duarte

We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks.

Quantization

Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference

1 code implementation22 Feb 2021 Benjamin Hawks, Javier Duarte, Nicholas J. Fraser, Alessandro Pappalardo, Nhan Tran, Yaman Umuroglu

We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning, and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics.

Bayesian Optimization Computational Efficiency +2

Cannot find the paper you are looking for? You can Submit a new open access paper.