L2 Regularization

28 papers with code • 0 benchmarks • 0 datasets

See Weight Decay.

$L_{2}$ Regularization or Weight Decay, is a regularization technique applied to the weights of a neural network. We minimize a loss function compromising both the primary loss function and a penalty on the $L_{2}$ Norm of the weights:

$$L_{new}\left(w\right) = L_{original}\left(w\right) + \lambda{w^{T}w}$$

where $\lambda$ is a value determining the strength of the penalty (encouraging smaller weights).

Weight decay can be incorporated directly into the weight update rule, rather than just implicitly by defining it through to objective function. Often weight decay refers to the implementation where we specify it directly in the weight update rule (whereas L2 regularization is usually the implementation which is specified in the objective function).

Monkeypox disease recognition model based on improved SE-InceptionV3

jzc777/se-inceptionv3-l2 15 Mar 2024

In the wake of the global spread of monkeypox, accurate disease recognition has become crucial.

0
15 Mar 2024

Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus

gabribg88/multiridge 23 Nov 2023

Common regularization algorithms for linear regression, such as LASSO and Ridge regression, rely on a regularization hyperparameter that balances the tradeoff between minimizing the fitting error and the norm of the learned model coefficients.

0
23 Nov 2023

The Transient Nature of Emergent In-Context Learning in Transformers

aadityasingh/icl-dynamics NeurIPS 2023

The transient nature of ICL is observed in transformers across a range of model sizes and datasets, raising the question of how much to "overtrain" transformers when seeking compact, cheaper-to-run models.

4
14 Nov 2023

Less is More -- Towards parsimonious multi-task models using structured sparsity

ricupa/less-is-more-towards-parsimonious-multi-task-models-using-structured-sparsity 23 Aug 2023

In this work, we introduce channel-wise l1/l2 group sparsity in the shared convolutional layers parameters (or weights) of the multi-task learning model.

2
23 Aug 2023

Maintaining Plasticity in Deep Continual Learning

shibhansh/loss-of-plasticity 23 Jun 2023

If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples.

47
23 Jun 2023

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks

epfml/req 26 May 2023

This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks through a combination of applied analysis and experimentation.

11
26 May 2023

It's Enough: Relaxing Diagonal Constraints in Linear Autoencoders for Recommendation

jaewan7599/rdlae_sigir2023 22 May 2023

Inspired by this analysis, we propose simple-yet-effective linear autoencoder models using diagonal inequality constraints, called Relaxed Linear AutoEncoder (RLAE) and Relaxed Denoising Linear AutoEncoder (RDLAE).

12
22 May 2023

Planting and Mitigating Memorized Content in Predictive-Text Language Models

microsoft/planting-and-mitigating-memorization 16 Dec 2022

Language models are widely deployed to provide automatic text completion services in user products.

1
16 Dec 2022

Motion Correction and Volumetric Reconstruction for Fetal Functional Magnetic Resonance Imaging Data

gift-surg/niftymic 11 Feb 2022

Here, we propose a novel framework, which estimates a high-resolution reference volume by using outlier-robust motion correction, and by utilizing Huber L2 regularization for intra-stack volumetric reconstruction of the motion-corrected fetal brain fMRI.

127
11 Feb 2022

How Infinitely Wide Neural Networks Can Benefit from Multi-task Learning -- an Exact Macroscopic Characterization

JakobHeiss/NN_regularization1 31 Dec 2021

In practice, multi-task learning (through learning features shared among tasks) is an essential property of deep neural networks (NNs).

1
31 Dec 2021