L2 Regularization

28 papers with code • 0 benchmarks • 0 datasets

See Weight Decay.

$L_{2}$ Regularization or Weight Decay, is a regularization technique applied to the weights of a neural network. We minimize a loss function compromising both the primary loss function and a penalty on the $L_{2}$ Norm of the weights:

$$L_{new}\left(w\right) = L_{original}\left(w\right) + \lambda{w^{T}w}$$

where $\lambda$ is a value determining the strength of the penalty (encouraging smaller weights).

Weight decay can be incorporated directly into the weight update rule, rather than just implicitly by defining it through to objective function. Often weight decay refers to the implementation where we specify it directly in the weight update rule (whereas L2 regularization is usually the implementation which is specified in the objective function).

Latest papers with no code

Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

no code yet • 30 Jun 2021

The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$.

Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation

no code yet • ACL 2021

Meanwhile, we force the conventional decoder to simulate the behaviors of the seer decoder via knowledge distillation.

Effect of the regularization hyperparameter on deep learning-based segmentation in LGE-MRI

no code yet • 10 Dec 2020

The extent to which the arbitrarily selected L2 regularization hyperparameter value affects the outcome of semantic segmentation with deep learning is demonstrated.

Gram Regularization for Multi-view 3D Shape Retrieval

no code yet • 16 Nov 2020

To make up the gap, in this paper, we propose a novel regularization term called Gram regularization which reinforces the learning ability of the network by encouraging the weight kernels to extract different information on the corresponding feature map.

Exponentially Weighted l_2 Regularization Strategy in Constructing Reinforced Second-order Fuzzy Rule-based Model

no code yet • 2 Jul 2020

In the conventional Takagi-Sugeno-Kang (TSK)-type fuzzy models, constant or linear functions are usually utilized as the consequent parts of the fuzzy rules, but they cannot effectively describe the behavior within local regions defined by the antecedent parts.

An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning

no code yet • 10 May 2020

In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable.

A Bayesian traction force microscopy method with automated denoising in a user-friendly software package

no code yet • 4 May 2020

Adherent biological cells generate traction forces on a substrate that play a central role for migration, mechanosensing, differentiation, and collective behavior.

Data-dependent Gaussian Prior Objective for Language Generation

no code yet • ICLR 2020

However, MLE focuses on once-to-all matching between the predicted sequence and gold-standard, consequently treating all incorrect predictions as being equally incorrect.

Correlated Initialization for Correlated Data

no code yet • 9 Mar 2020

Our theoretical analysis quantifies the learning behavior of weights of a single spatial filter.

Tighter Bound Estimation of Sensitivity Analysis for Incremental and Decremental Data Modification

no code yet • 6 Mar 2020

Specifically, the proposed algorithm can be used to estimate the upper and lower bounds of the updated classifier's coefficient matrix with a low computational complexity related to the size of the updated dataset.