Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.
SOTA for Language Modelling on Hutter Prize
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.
SOTA for Question Answering on SQuAD2.0 dev (using extra training data)
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.
SOTA for Common Sense Reasoning on SWAG
Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs.
#2 best model for Machine Translation on WMT2016 English-Russian
Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets.
SOTA for Language Modelling on Text8 (using extra training data)