Regularization

Weight Decay

Weight Decay, or $L_{2}$ Regularization, is a regularization technique applied to the weights of a neural network. We minimize a loss function compromising both the primary loss function and a penalty on the $L_{2}$ Norm of the weights:

$$L_{new}\left(w\right) = L_{original}\left(w\right) + \lambda{w^{T}w}$$

where $\lambda$ is a value determining the strength of the penalty (encouraging smaller weights).

Weight decay can be incorporated directly into the weight update rule, rather than just implicitly by defining it through to objective function. Often weight decay refers to the implementation where we specify it directly in the weight update rule (whereas L2 regularization is usually the implementation which is specified in the objective function).

Image Source: Deep Learning, Goodfellow et al

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 78 9.65%
Retrieval 77 9.53%
Question Answering 48 5.94%
Large Language Model 41 5.07%
Sentence 25 3.09%
In-Context Learning 22 2.72%
Text Generation 20 2.48%
Information Retrieval 19 2.35%
Code Generation 14 1.73%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories