Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.
Ranked #1 on Language Modelling on One Billion Word
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.
Ranked #1 on Text Classification on DBpedia
We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.
Ranked #9 on Question Answering on SQuAD1.1 dev (F1 metric)
We show that the use of web crawled data is preferable to the use of Wikipedia data.
Ranked #1 on Part-Of-Speech Tagging on French GSD
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).
Ranked #1 on Linguistic Acceptability on CoLA
COMMON SENSE REASONING COREFERENCE RESOLUTION DOCUMENT SUMMARIZATION LINGUISTIC ACCEPTABILITY MACHINE TRANSLATION NATURAL LANGUAGE INFERENCE QUESTION ANSWERING SEMANTIC TEXTUAL SIMILARITY SENTIMENT ANALYSIS TEXT CLASSIFICATION TRANSFER LEARNING WORD SENSE DISAMBIGUATION
Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not.
Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.
Ranked #1 on Natural Language Inference on QNLI
In this paper, we describe a novel approach for detecting humor in short texts using BERT sentence embedding.
Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks.