Differentiable Training for Hardware Efficient LightNNs

NIPS Workshop CDNNRIA 2018 · Ruizhou Ding, Zeye Liu, Ting-Wu Chin, Diana Marculescu, R.D. (Shawn) Blanton ·

To reduce runtime and resource utilization of Deep Neural Networks (DNNs) on customized hardware, LightNN has been proposed by constraining the weights of DNNs to be a sum of a limited number (denoted as $k\in\{1,2\}$) of powers of 2. LightNNs can therefore can replace the multiplication between activations and weights with a shift operation or two shifts and an add operation. To provide a more continuous Pareto-optimal curve of accuracy and runtime so that hardware designers can have more flexible options of DNN configurations, one can customize the $k$ for each convolutional filter. In this paper, we formulate the selection of $k$ to be differentiable, and train the model weights and per-filter $k$ in an end-to-end fashion. Since flexible-$k$ LightNNs (FLightNNs) fully utilize the hardware resources on Field Programmable Gate Arrays (FPGAs), our experimental results show that FLightNN can achieve 2$\times$ speedup on FPGAs when compared to LightNN-2, with only 0.1\% accuracy degradation. In addition, compared to a 4-bit fixed-point quantization, FLightNN can achieve slightly higher accuracy and 1.4$\times$ speedup, due to its lightweight shift operations.

PDF Abstract