Search Results for author: Tingxuan Zhong

Found 1 papers, 1 papers with code

QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models

1 code implementation13 Oct 2023 Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh

We show, for the first time, that the majority of inference computations for large generative models such as LLaMA, OPT, and Falcon can be performed with both weights and activations being cast to 4 bits, in a way that leads to practical speedups, while at the same time maintaining good accuracy.

Computational Efficiency Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.