Dynamic Probabilistic Pruning: Training sparse networks based on stochastic and dynamic masking

1 Jan 2021 · Lizeth Gonzalez Carabarin, Iris A.M. Huijben, Bastiaan S. Veeling, Alexandre Schmid, Ruud Van Sloun ·

Deep Learning (DL) models are known to be heavily over-parametrized, resulting in a large memory footprint and power consumption. This hampers the use of such models in hardware-constrained edge technologies such as wearables and mobile devices. Model compression during training can be achieved by promoting sparse network structures both through weight regularization and by leveraging dynamic pruning methods. State-of-the-art pruning methods are however mostly magnitude-based which impedes their use in e.g. binary settings. Importantly, most of the pruning methods do not provide a structural sparsity, resulting in an inefficient memory allocation and access for hardware implementations. In this paper, we propose a novel dynamic pruning solution that we term Dynamic Probabilistic Pruning (DPP). DPP leverages Gumbel top-K sampling to select subsets of weights during training, which enables exploring which weights are most relevant. Our approach allows for setting an explicit per-neuron layer-wise sparsity level and structural pruning across weights and feature maps, without relying on weight magnitude heuristics. Relevantly, our method generates a hardware-oriented structural sparsity for fully-connected and convolutional layers that facilitates memory allocation and access, in contrast with conventional unstructured pruning. We show that DPP achieves competitive sparsity levels and classification accuracy on MNIST and CIFAR-10, CIFAR-100 datasets compared to a state-of-the-art baseline for various DL architectures, while respecting per-neuron sparsity constraints.

PDF Abstract