Language Models

BERT, or Bidirectional Encoder Representations from Transformers, improves upon standard Transformers by removing the unidirectionality constraint by using a masked language model (MLM) pre-training objective. The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context. Unlike left-to-right language model pre-training, the MLM objective enables the representation to fuse the left and the right context, which allows us to pre-train a deep bidirectional Transformer. In addition to the masked language model, BERT uses a next sentence prediction task that jointly pre-trains text-pair representations.

There are two steps in BERT: pre-training and fine-tuning. During pre-training, the model is trained on unlabeled data over different pre-training tasks. For fine-tuning, the BERT model is first initialized with the pre-trained parameters, and all of the parameters are fine-tuned using labeled data from the downstream tasks. Each downstream task has separate fine-tuned models, even though they are initialized with the same pre-trained parameters.

Source: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Retrieval 118 12.29%
Language Modelling 107 11.15%
Question Answering 61 6.35%
Large Language Model 39 4.06%
Sentiment Analysis 33 3.44%
Text Classification 33 3.44%
Sentence 33 3.44%
Information Retrieval 22 2.29%
Text Generation 18 1.88%

Categories