Inverse Square Root Schedule

Inverse Square Root is a learning rate schedule 1 / $\sqrt{\max\left(n, k\right)}$ where $n$ is the current training iteration and $k$ is the number of warm-up steps. This sets a constant learning rate for the first $k$ steps, then exponentially decays the learning rate until pre-training is over.

Latest Papers

PAPER DATE
UCD-CS at W-NUT 2020 Shared Task-3: A Text to Text Approach for COVID-19 Event Extraction on Social Media
Congcong WangDavid Lillis
2020-09-21
PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data
| Diedre CarmoMarcos PiauIsrael CampiottiRodrigo NogueiraRoberto Lotufo
2020-08-20
Lite Training Strategies for Portuguese-English and English-Portuguese Translation
Alexandre LopesRodrigo NogueiraRoberto LotufoHelio Pedrini
2020-08-20
Investigating Pretrained Language Models for Graph-to-Text Generation
Leonardo F. R. RibeiroMartin SchmittHinrich SchützeIryna Gurevych
2020-07-16
HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections
Yi TayZhe ZhaoDara BahriDonald MetzlerDa-Cheng Juan
2020-07-12
Normalizador Neural de Datas e Endereços
Gustavo PlensackPaulo Finardi
2020-06-27
Text-to-Text Pre-Training for Data-to-Text Tasks
| Mihir Kale
2020-05-21
Evaluating Machines by their Real-World Language Use
| Rowan ZellersAri HoltzmanElizabeth ClarkLianhui QinAli FarhadiYejin Choi
2020-04-07
TTTTTackling WinoGrande Schemas
Sheng-Chieh LinJheng-Hong YangRodrigo NogueiraMing-Feng TsaiChuan-Ju WangJimmy Lin
2020-03-18
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
| Colin RaffelNoam ShazeerAdam RobertsKatherine LeeSharan NarangMichael MatenaYanqi ZhouWei LiPeter J. Liu
2019-10-23

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories