L2 Regularization
28 papers with code • 0 benchmarks • 0 datasets
See Weight Decay.
$L_{2}$ Regularization or Weight Decay, is a regularization technique applied to the weights of a neural network. We minimize a loss function compromising both the primary loss function and a penalty on the $L_{2}$ Norm of the weights:
$$L_{new}\left(w\right) = L_{original}\left(w\right) + \lambda{w^{T}w}$$
where $\lambda$ is a value determining the strength of the penalty (encouraging smaller weights).
Weight decay can be incorporated directly into the weight update rule, rather than just implicitly by defining it through to objective function. Often weight decay refers to the implementation where we specify it directly in the weight update rule (whereas L2 regularization is usually the implementation which is specified in the objective function).
Benchmarks
These leaderboards are used to track progress in L2 Regularization
Latest papers
Monkeypox disease recognition model based on improved SE-InceptionV3
In the wake of the global spread of monkeypox, accurate disease recognition has become crucial.
Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus
Common regularization algorithms for linear regression, such as LASSO and Ridge regression, rely on a regularization hyperparameter that balances the tradeoff between minimizing the fitting error and the norm of the learned model coefficients.
The Transient Nature of Emergent In-Context Learning in Transformers
The transient nature of ICL is observed in transformers across a range of model sizes and datasets, raising the question of how much to "overtrain" transformers when seeking compact, cheaper-to-run models.
Less is More -- Towards parsimonious multi-task models using structured sparsity
In this work, we introduce channel-wise l1/l2 group sparsity in the shared convolutional layers parameters (or weights) of the multi-task learning model.
Maintaining Plasticity in Deep Continual Learning
If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples.
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks through a combination of applied analysis and experimentation.
It's Enough: Relaxing Diagonal Constraints in Linear Autoencoders for Recommendation
Inspired by this analysis, we propose simple-yet-effective linear autoencoder models using diagonal inequality constraints, called Relaxed Linear AutoEncoder (RLAE) and Relaxed Denoising Linear AutoEncoder (RDLAE).
Planting and Mitigating Memorized Content in Predictive-Text Language Models
Language models are widely deployed to provide automatic text completion services in user products.
Motion Correction and Volumetric Reconstruction for Fetal Functional Magnetic Resonance Imaging Data
Here, we propose a novel framework, which estimates a high-resolution reference volume by using outlier-robust motion correction, and by utilizing Huber L2 regularization for intra-stack volumetric reconstruction of the motion-corrected fetal brain fMRI.
How Infinitely Wide Neural Networks Can Benefit from Multi-task Learning -- an Exact Macroscopic Characterization
In practice, multi-task learning (through learning features shared among tasks) is an essential property of deep neural networks (NNs).