L2 Regularization

28 papers with code • 0 benchmarks • 0 datasets

See Weight Decay.

$L_{2}$ Regularization or Weight Decay, is a regularization technique applied to the weights of a neural network. We minimize a loss function compromising both the primary loss function and a penalty on the $L_{2}$ Norm of the weights:

$$L_{new}\left(w\right) = L_{original}\left(w\right) + \lambda{w^{T}w}$$

where $\lambda$ is a value determining the strength of the penalty (encouraging smaller weights).

Weight decay can be incorporated directly into the weight update rule, rather than just implicitly by defining it through to objective function. Often weight decay refers to the implementation where we specify it directly in the weight update rule (whereas L2 regularization is usually the implementation which is specified in the objective function).

Latest papers with no code

Linking Neural Collapse and L2 Normalization with Improved Out-of-Distribution Detection in Deep Neural Networks

no code yet • 17 Sep 2022

We propose a simple modification to standard ResNet architectures--L2 normalization over feature space--that substantially improves out-of-distribution (OoD) performance on the previously proposed Deep Deterministic Uncertainty (DDU) benchmark.

On the utility and protection of optimization with differential privacy and classic regularization techniques

no code yet • 7 Sep 2022

According to the literature, this approach has proven to be a successful defence against several models' privacy attacks, but its downside is a substantial degradation of the models' performance.

Perturbation of Deep Autoencoder Weights for Model Compression and Classification of Tabular Data

no code yet • 17 May 2022

Unlike dropout learning, the proposed weight perturbation routine additionally achieves 15% to 40% sparsity across six tabular data sets for the compression of deep pretrained models.

Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks

no code yet • 15 May 2022

L2 regularization for weights in neural networks is widely used as a standard training trick.

A Note on the Regularity of Images Generated by Convolutional Neural Networks

no code yet • 22 Apr 2022

The regularity of images generated by convolutional neural networks, such as the U-net, generative networks, or the deep image prior, is analyzed.

A Closer Look at Rehearsal-Free Continual Learning

no code yet • 31 Mar 2022

Next, we explore how to leverage knowledge from a pre-trained model in rehearsal-free continual learning and find that vanilla L2 parameter regularization outperforms EWC parameter regularization and feature distillation.

Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

no code yet • 22 Oct 2021

In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.

Regularized Training of Nearest Neighbor Language Models

no code yet • NAACL (ACL) 2022

In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.

Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

no code yet • 30 Jun 2021

The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$.

Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation

no code yet • ACL 2021

Meanwhile, we force the conventional decoder to simulate the behaviors of the seer decoder via knowledge distillation.