Low Entropy Deep Networks

29 Sep 2021  ·  Chris Subia-Waud, Srinandan Dasmahapatra ·

The movement of data between processes and memory, not arithmetic operations, dominate the energy costs of deep learning inference calculations. This work focuses on reducing these data movement costs by reducing the number of unique weights in a network. The thinking goes that if the number of unique weights is kept small enough, then the entire network can be distributed and stored on processing elements (PEs) within accelerator designs, and the data movement costs for weight reads substantially reduced. To this end, we investigate the merits of a method, which we call Weight Fixing Networks (WFN). We design the approach to realise four model outcome objectives: i) very few unique weights, ii) low-entropy weight encodings, iii) unique weight values which are amenable to energy-saving versions of hardware multiplication, and iv) lossless task-performance. Some of these goals are conflicting. To best balance these conflicts, we combine a few novel (and some well-trodden) tricks; a novel regularisation term, (i, ii) a view of clustering cost as relative distance change (i, ii, iv), and a focus on whole-network re-use of weights (i, iii). Our Imagenet experiments demonstrate lossless compression using 56x fewer unique weights and a 1.9x lower weight-space entropy than SOTA quantisation approaches.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here