Transformer-XL (meaning extra long) is a Transformer architecture that introduces the notion of recurrence to the deep self-attention network. Instead of computing the hidden states from scratch for each new segment, Transformer-XL reuses the hidden states obtained in previous segments. The reused hidden states serve as memory for the current segment, which builds up a recurrent connection between the segments. As a result, modeling very long-term dependency becomes possible because information can be propagated through the recurrent connections. As an additional contribution, the Transformer-XL uses a new relative positional encoding formulation that generalizes to attention lengths longer than the one observed during training.

Source: Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Latest Papers

PAPER DATE
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Xie ChenYu WuZhenghao WangShujie LiuJinyu Li
2020-10-22
Memformer: The Memory-Augmented Transformer
Qingyang WuZhenzhong LanJing GuZhou Yu
2020-10-14
Pay Attention when Required
Swetha MandavaSzymon MigaczAlex Fit Florea
2020-09-09
The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-composed Music through Quantitative Measures
| Shih-Lun WuYi-Hsuan Yang
2020-08-04
Automatic Composition of Guitar Tabs by Transformers and Groove Modeling
Yu-Hua ChenYu-Hsiang HuangWen-Yi HsiaoYi-Hsuan Yang
2020-08-04
Language Modelling for Source Code with Transformer-XL
| Thomas DowdellHongyu Zhang
2020-07-31
Do Transformers Need Deep Long-Range Memory
Jack W. RaeAli Razavi
2020-07-07
Do Transformers Need Deep Long-Range Memory?
Jack RaeAli Razavi
2020-07-01
Probing for Referential Information in Language Models
Ionut-Teodor SorodocKristina GulordavaGemma Boleda
2020-07-01
Mind The Facts: Knowledge-Boosted Coherent Abstractive Text Summarization
Beliz GunelChenguang ZhuMichael ZengXuedong Huang
2020-06-27
Exploring Transformers for Large-Scale Speech Recognition
Liang LuChangliang LiuJinyu LiYifan Gong
2020-05-19
Improving Neural Language Generation with Spectrum Control
Lingxiao WangJing HuangKevin HuangZiniu HuGuangtao WangQuanquan Gu
2020-05-01
Finnish Language Modeling with Deep Transformer Models
Abhilash JainAku RuoheStig-Arne GrönroosMikko Kurimo
2020-03-14
Neural Academic Paper Generation
| Samet DemirUras MutluÖzgur Özdemir
2019-12-02
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
Sachin MehtaRik Koncel-KedziorskiMohammad RastegariHannaneh Hajishirzi
2019-11-27
Compressive Transformers for Long-Range Sequence Modelling
| Jack W. RaeAnna PotapenkoSiddhant M. JayakumarTimothy P. Lillicrap
2019-11-13
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Oleksii HrinchukMariya PopovaBoris Ginsburg
2019-10-23
Stabilizing Transformers for Reinforcement Learning
| Emilio ParisottoH. Francis SongJack W. RaeRazvan PascanuCaglar GulcehreSiddhant M. JayakumarMax JaderbergRaphael Lopez KaufmanAidan ClarkSeb NouryMatthew M. BotvinickNicolas HeessRaia Hadsell
2019-10-13
GDP: Generalized Device Placement for Dataflow Graphs
Yanqi ZhouSudip RoyAmirali AbdolrashidiDaniel WongPeter C. MaQiumin XuMing ZhongHanxiao LiuAnna GoldieAzalia MirhoseiniJames Laudon
2019-09-28
A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning
Fang LiuGe LiBolin WeiXin XiaZhiyi FuZhi Jin
2019-09-16
Ouroboros: On Accelerating Training of Transformer-Based Language Models
| Qian YangZhouyuan HuoWenlin WangHeng HuangLawrence Carin
2019-09-14
A Tensorized Transformer for Language Modeling
Xindian MaPeng ZhangShuai ZhangNan DuanYuexian HouDawei SongMing Zhou
2019-06-24
XLNet: Generalized Autoregressive Pretraining for Language Understanding
| Zhilin YangZihang DaiYiming YangJaime CarbonellRuslan SalakhutdinovQuoc V. Le
2019-06-19
Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)
| Mariya TonevaLeila Wehbe
2019-05-28
Transformer-XL: Language Modeling with Longer-Term Dependency
Zihang Dai*Zhilin Yang*Yiming YangWilliam W. CohenJaime CarbonellQuoc V. LeRuslan Salakhutdinov
2019-05-01
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
| Zihang DaiZhilin YangYiming YangJaime CarbonellQuoc V. LeRuslan Salakhutdinov
2019-01-09

Categories