R2 Loss: Range Restriction Loss for Model Compression and Quantization

14 Mar 2023  ·  Arnav Kundu, Chungkuk Yoo, Srijan Mishra, Minsik Cho, Saurabh Adya ·

Model quantization and compression is widely used techniques to reduce usage of computing resource at inference time. While state-of-the-art works have been achieved reasonable accuracy with higher bit such as 4bit or 8bit, but still it is challenging to quantize/compress a model further, e.g., 1bit or 2bit. To overcome the challenge, we focus on outliers in weights of a pre-trained model which disrupt effective lower bit quantization and compression. In this work, we propose Range Restriction Loss (R2-Loss) for building lower bit quantization and compression friendly models by removing outliers from weights during pre-training. By effectively restricting range of weights, we mold the overall distribution into a tight shape to ensure high quantization bit resolution, therefore allowing model compression and quantization techniques can to utilize their limited numeric representation powers better. We introduce three different, L-inf R2-Loss, its extension Margin R2-Loss and a new Soft-Min-MaxR2-Loss to be used as an auxiliary loss during full-precision model training. These R2-Loss can be used in different cases such as L-inf and Margin R2-Loss would be effective for symmetric quantization, while Soft-Min-Max R2-Loss shows better performance for model compression. In our experiment, R2-Loss improves lower bit quantization accuracy with state-of-the-art post-training quantization (PTQ), quantization-aware training (QAT), and model compression techniques. With R2-Loss, MobileNet-V2 2bit weight and 8bit activation PTQ, MobileNet-V1 2bit weight and activation QAT, ResNet18 1bit weight compression are improved to 59.49% from 50.66%, 59.05% from 55.96%, and 52.58% from 45.54%, respectively.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Model Compression ImageNet ResNet-18 + 1bit-1dim model compression using DKM Top-1 59.7 # 10
Model Compression ImageNet ResNet-18 + 2bit-1dim model compression using DKM Top-1 68.63 # 5
Model Compression ImageNet MobileNet-v1 + 1bit-1dim model compression using DKM Top-1 52.58 # 12
Quantization ImageNet MobileNet-v1 + EWGS + R2Loss Top-1 Accuracy (%) 69.79 # 25
Weight bits 4 # 4
Quantization ImageNet ResNet-18 + PACT + R2Loss Top-1 Accuracy (%) 68.45 # 27
Weight bits 2 # 1
Activation bits 4 # 1
Quantization ImageNet MobileNet-v1 + LSQ + R2Loss Top-1 Accuracy (%) 69.64 # 26
Model Compression ImageNet ResNet-18 + 4bit-4dim model compression using DKM Top-1 66.1 # 7
Model Compression ImageNet MobileNet-v1 + 4bit-4dim model compression using DKM Top-1 61.4 # 9
Model Compression ImageNet MobileNet-v1 + 2bit-2dim model compression using DKM Top-1 53.99 # 11
Model Compression ImageNet ResNet-18 + 2bit-2dim model compression using DKM Top-1 64.7 # 8
Model Compression ImageNet MobileNet-v1 + 4bit-1dim model compression using DKM Top-1 69.63 # 4
Model Compression ImageNet ResNet-18 + 4bit-1dim model compression using DKM Top-1 70.52 # 3
Model Compression ImageNet MobileNet-v1 + 2bit-1dim model compression using DKM Top-1 67.62 # 6
Model Compression QNLI MobileBERT + 1bit-1dim model compression using DKM Accuracy 63.17 # 2
Model Compression QNLI MobileBERT + 2bit-1dim model compression using DKM Accuracy 82.13 # 1

Methods