Language modeling is the task of predicting the next word or character in a document.
( Image credit: Exploring the Limits of Language Modeling )
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We propose a new benchmark corpus to be used for measuring progress in statistical language modeling.
#15 best model for Language Modelling on One Billion Word
We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.
Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not.
These approaches corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.
#3 best model for Semantic Textual Similarity on STS Benchmark
Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks.
Large transformer-based language models (LMs) trained on huge text corpora have shown unparalleled generation capabilities.
We show that the use of web crawled data is preferable to the use of Wikipedia data.
SOTA for Part-Of-Speech Tagging on French GSD