Optimizing Quantized Neural Networks with Natural Gradient
Quantized Neural Networks (QNNs) have achieved an enormous step in improving computational efficiency, making it possible to deploy large models to mobile and miniaturized devices. In order to narrow the performance gap between low-precision and full-precision models, we introduce the natural gradient to train a low-precision model, which considers the curvature information of the model by viewing the parameter space as a Riemannian manifold. Specifically, we propose a novel Optimized Natural Gradient Descent (ONGD) method, which avoids the computation of Fisher Information Matrix (FIM) and updates the parameters with an amount of computation comparable to Stochastic Gradient Descent (SGD). We conduct an ablation study and results show that the 4-bit quantized ResNet-32 trained with ONGD has a better result than SGD, i.e. 2.05\% higher in Top-1 accuracy on CIFAR100 dataset. Further comparison experiments illustrate that our method achieves state-of-the-art results in CIFAR and ImageNet datasets, where the 8-bit version of MobileNet achieves 0.25\%/0.13\% higher in Top-1/Top-5 accuracies than the full-precision version on ImageNet dataset.
PDF Abstract