L2 Regularization

28 papers with code • 0 benchmarks • 0 datasets

$L_{2}$ Regularization or Weight Decay, is a regularization technique applied to the weights of a neural network. We minimize a loss function compromising both the primary loss function and a penalty on the $L_{2}$ Norm of the weights:

$$L_{new}\left(w\right) = L_{original}\left(w\right) + \lambda{w^{T}w}$$

where $\lambda$ is a value determining the strength of the penalty (encouraging smaller weights).

Weight decay can be incorporated directly into the weight update rule, rather than just implicitly by defining it through to objective function. Often weight decay refers to the implementation where we specify it directly in the weight update rule (whereas L2 regularization is usually the implementation which is specified in the objective function).

Benchmarks

Add a Result

These leaderboards are used to track progress in L2 Regularization

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Latest papers with no code

Most implemented Social Latest No code

Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

no code yet • 30 Jun 2021

The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$.

Paper
Add Code

Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation

no code yet • ACL 2021

Meanwhile, we force the conventional decoder to simulate the behaviors of the seer decoder via knowledge distillation.

Paper
Add Code

Effect of the regularization hyperparameter on deep learning-based segmentation in LGE-MRI

no code yet • 10 Dec 2020

The extent to which the arbitrarily selected L2 regularization hyperparameter value affects the outcome of semantic segmentation with deep learning is demonstrated.

Paper
Add Code

Gram Regularization for Multi-view 3D Shape Retrieval

no code yet • 16 Nov 2020

To make up the gap, in this paper, we propose a novel regularization term called Gram regularization which reinforces the learning ability of the network by encouraging the weight kernels to extract different information on the corresponding feature map.

Paper
Add Code

Exponentially Weighted l_2 Regularization Strategy in Constructing Reinforced Second-order Fuzzy Rule-based Model

no code yet • 2 Jul 2020

In the conventional Takagi-Sugeno-Kang (TSK)-type fuzzy models, constant or linear functions are usually utilized as the consequent parts of the fuzzy rules, but they cannot effectively describe the behavior within local regions defined by the antecedent parts.

Paper
Add Code

An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning

no code yet • 10 May 2020

In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable.

Paper
Add Code

A Bayesian traction force microscopy method with automated denoising in a user-friendly software package

no code yet • 4 May 2020

Adherent biological cells generate traction forces on a substrate that play a central role for migration, mechanosensing, differentiation, and collective behavior.

Paper
Add Code

Data-dependent Gaussian Prior Objective for Language Generation

no code yet • ICLR 2020

However, MLE focuses on once-to-all matching between the predicted sequence and gold-standard, consequently treating all incorrect predictions as being equally incorrect.

Paper
Add Code

Correlated Initialization for Correlated Data

no code yet • 9 Mar 2020

Our theoretical analysis quantifies the learning behavior of weights of a single spatial filter.

Paper
Add Code

Tighter Bound Estimation of Sensitivity Analysis for Incremental and Decremental Data Modification

no code yet • 6 Mar 2020

Specifically, the proposed algorithm can be used to estimate the upper and lower bounds of the updated classifier's coefficient matrix with a low computational complexity related to the size of the updated dataset.

Paper
Add Code

L2 Regularization

Benchmarks Add a Result

Latest papers with no code

Content

Benchmarks

Add a Result