Inverse Square Root Schedule

Inverse Square Root is a learning rate schedule 1 / $\sqrt{\max\left(n, k\right)}$ where $n$ is the current training iteration and $k$ is the number of warm-up steps. This sets a constant learning rate for the first $k$ steps, then exponentially decays the learning rate until pre-training is over.

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	97	9.63%
Question Answering	65	6.45%
Text Generation	48	4.77%
Sentence	44	4.37%
Translation	32	3.18%
Retrieval	31	3.08%
Machine Translation	26	2.58%
Natural Language Understanding	22	2.18%
Semantic Parsing	19	1.89%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Learning Rate Schedules