The Regularizing Effect of Different Output Layer Designs in Deep Neural Networks

NeurIPS 2021 · Benjamin Bergner, Christoph Lippert ·

Deep neural networks are prone to overfitting, especially on small datasets. Common regularizers such as dropout or dropconnect reduce overfitting, but are complex and prone to hyperparameter choices, thus prolonging development cycles in practice. In this paper, we propose simple but effective design changes to the output layer - namely randomization, sparsity, activation scaling, and ensembling - that lead to improved regularization. These designs are motivated by experiments showing that standard fully-connected output layers tend to rely on individual input neurons, which in turn do not cover the variance of the data. We call these two related phenomena neuron dependency and expressivity, propose different ways to measure them, and optimize the presented output layers for them. In our experiments, we compare these layer types for image classification and semantic segmentation across architectures, datasets, and application settings. We report significantly and consistently improved performance of up to 10% points in accuracy over standard output layers while reducing the number of trainable parameters by up to 90%. It is demonstrated that neither training of output layers is required, nor are output layers themselves crucial components of deep networks.

PDF Abstract