LARS

Introduced by You et al. in Large Batch Training of Convolutional Networks

Layer-wise Adaptive Rate Scaling, or LARS, is a large batch optimization technique. There are two notable differences between LARS and other adaptive algorithms such as Adam or RMSProp: first, LARS uses a separate learning rate for each layer and not for each weight. And second, the magnitude of the update is controlled with respect to the weight norm for better control of training speed.

$$m_{t} = \beta_{1}m_{t-1} + \left(1-\beta_{1}\right)\left(g_{t} + \lambda{x_{t}}\right)$$ $$x_{t+1}^{\left(i\right)} = x_{t}^{\left(i\right)} - \eta_{t}\frac{\phi\left(|| x_{t}^{\left(i\right)} ||\right)}{|| m_{t}^{\left(i\right)} || }m_{t}^{\left(i\right)} $$

Source: Large Batch Training of Convolutional Networks

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Self-Supervised Learning	17	15.45%
Image Classification	10	9.09%
Clustering	6	5.45%
Semantic Segmentation	5	4.55%
Question Answering	4	3.64%
Object Detection	4	3.64%
Semi-Supervised Image Classification	4	3.64%
Self-Supervised Image Classification	3	2.73%
Reinforcement Learning (RL)	2	1.82%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Large Batch Optimization