Adaptive Softmax

Introduced by Grave et al. in Efficient softmax approximation for GPUs

Adaptive Softmax is a speedup technique for the computation of probability distributions over words. The adaptive softmax is inspired by the class-based hierarchical softmax, where the word classes are built to minimize the computation time. Adaptive softmax achieves efficiency by explicitly taking into account the computation time of matrix-multiplication on parallel systems and combining it with a few important observations, namely keeping a shortlist of frequent words in the root node and reducing the capacity of rare words.

Source: Efficient softmax approximation for GPUs

Latest Papers

PAPER DATE
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Xie ChenYu WuZhenghao WangShujie LiuJinyu Li
2020-10-22
Memformer: The Memory-Augmented Transformer
Qingyang WuZhenzhong LanJing GuZhou Yu
2020-10-14
Pay Attention when Required
Swetha MandavaSzymon MigaczAlex Fit Florea
2020-09-09
The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-composed Music through Quantitative Measures
| Shih-Lun WuYi-Hsuan Yang
2020-08-04
Automatic Composition of Guitar Tabs by Transformers and Groove Modeling
Yu-Hua ChenYu-Hsiang HuangWen-Yi HsiaoYi-Hsuan Yang
2020-08-04
DeLighT: Very Deep and Light-weight Transformer
| Sachin MehtaMarjan GhazvininejadSrinivasan IyerLuke ZettlemoyerHannaneh Hajishirzi
2020-08-03
Language Modelling for Source Code with Transformer-XL
| Thomas DowdellHongyu Zhang
2020-07-31
Do Transformers Need Deep Long-Range Memory
Jack W. RaeAli Razavi
2020-07-07
Do Transformers Need Deep Long-Range Memory?
Jack RaeAli Razavi
2020-07-01
Probing for Referential Information in Language Models
Ionut-Teodor SorodocKristina GulordavaGemma Boleda
2020-07-01
Mind The Facts: Knowledge-Boosted Coherent Abstractive Text Summarization
Beliz GunelChenguang ZhuMichael ZengXuedong Huang
2020-06-27
Exploring Transformers for Large-Scale Speech Recognition
Liang LuChangliang LiuJinyu LiYifan Gong
2020-05-19
Improving Neural Language Generation with Spectrum Control
Lingxiao WangJing HuangKevin HuangZiniu HuGuangtao WangQuanquan Gu
2020-05-01
Finnish Language Modeling with Deep Transformer Models
Abhilash JainAku RuoheStig-Arne GrönroosMikko Kurimo
2020-03-14
DeFINE: Deep Factorized Input Word Embeddings for Neural Sequence Modeling
Anonymous
2020-01-01
Improving Neural Language Generation with Spectrum Control
Anonymous
2020-01-01
Neural Academic Paper Generation
| Samet DemirUras MutluÖzgur Özdemir
2019-12-02
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
Sachin MehtaRik Koncel-KedziorskiMohammad RastegariHannaneh Hajishirzi
2019-11-27
Compressive Transformers for Long-Range Sequence Modelling
| Jack W. RaeAnna PotapenkoSiddhant M. JayakumarTimothy P. Lillicrap
2019-11-13
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Oleksii HrinchukMariya PopovaBoris Ginsburg
2019-10-23
Stabilizing Transformers for Reinforcement Learning
| Emilio ParisottoH. Francis SongJack W. RaeRazvan PascanuCaglar GulcehreSiddhant M. JayakumarMax JaderbergRaphael Lopez KaufmanAidan ClarkSeb NouryMatthew M. BotvinickNicolas HeessRaia Hadsell
2019-10-13
GDP: Generalized Device Placement for Dataflow Graphs
Yanqi ZhouSudip RoyAmirali AbdolrashidiDaniel WongPeter C. MaQiumin XuMing ZhongHanxiao LiuAnna GoldieAzalia MirhoseiniJames Laudon
2019-09-28
A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning
Fang LiuGe LiBolin WeiXin XiaZhiyi FuZhi Jin
2019-09-16
Ouroboros: On Accelerating Training of Transformer-Based Language Models
| Qian YangZhouyuan HuoWenlin WangHeng HuangLawrence Carin
2019-09-14
A Tensorized Transformer for Language Modeling
Xindian MaPeng ZhangShuai ZhangNan DuanYuexian HouDawei SongMing Zhou
2019-06-24
XLNet: Generalized Autoregressive Pretraining for Language Understanding
| Zhilin YangZihang DaiYiming YangJaime CarbonellRuslan SalakhutdinovQuoc V. Le
2019-06-19
Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)
| Mariya TonevaLeila Wehbe
2019-05-28
Transformer-XL: Language Modeling with Longer-Term Dependency
Zihang Dai*Zhilin Yang*Yiming YangWilliam W. CohenJaime CarbonellQuoc V. LeRuslan Salakhutdinov
2019-05-01
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
| Zihang DaiZhilin YangYiming YangJaime CarbonellQuoc V. LeRuslan Salakhutdinov
2019-01-09
Adaptive Input Representations for Neural Language Modeling
| Alexei BaevskiMichael Auli
2018-09-28
Language Modeling with Gated Convolutional Networks
| Yann N. DauphinAngela FanMichael AuliDavid Grangier
2016-12-23
Efficient softmax approximation for GPUs
| Edouard GraveArmand JoulinMoustapha CisséDavid GrangierHervé Jégou
2016-09-14

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories