Instance-Aware Dynamic Neural Network Quantization

CVPR 2022 · Zhenhua Liu, Yunhe Wang, Kai Han, Siwei Ma, Wen Gao ·

Quantization is an effective way to reduce the memory and computational costs of deep neural networks in which the full-precision weights and activations are represented using low-bit values. The bit-width for each layer in most of existing quantization methods is static, i.e., the same for all samples in the given dataset. However, natural images are of huge diversity with abundant content and using such a universal quantization configuration for all samples is not an optimal strategy. In this paper, we present to conduct the low-bit quantization for each image individually, and develop a dynamic quantization scheme for exploring their optimal bit-widths. To this end, a lightweight bit-controller is established and trained jointly with the given neural network to be quantized. During inference, the quantization configuration for an arbitrary image will be determined by the bit-widths generated by the controller, e.g., an image with simple texture will be allocated with lower bits and computational complexity and vice versa. Experimental results conducted on benchmarks demonstrate the effectiveness of the proposed dynamic quantization method for achieving state-of-art performance in terms of accuracy and computational complexity. The code will be available at https://github.com/huawei-noah/Efficient-Computing and https://gitee.com/mindspore/models/tree/master/research/cv/DynamicQuant.

PDF Abstract