ELECTRA Explained | Papers With Code

ELECTRA is a transformer with a new pre-training approach which trains two transformer models: the generator and the discriminator. The generator replaces tokens in the sequence - trained as a masked language model - and the discriminator (the ELECTRA contribution) attempts to identify which tokens are replaced by the generator in the sequence. This pre-training task is called replaced token detection, and is a replacement for masking the input.

Source: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	31	14.62%
Sentence	16	7.55%
Question Answering	16	7.55%
Reading Comprehension	10	4.72%
Natural Language Inference	7	3.30%
Natural Language Understanding	6	2.83%
Text Classification	5	2.36%
Named Entity Recognition (NER)	5	2.36%
Multi-Task Learning	5	2.36%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Adam	Stochastic Optimization
Attention Dropout	Regularization
Dense Connections	Feedforward Networks
Dropout	Regularization
GELU	Activation Functions
Layer Normalization	Normalization
Linear Warmup With Linear Decay	Learning Rate Schedules
Multi-Head Attention	Attention Modules
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms
Softmax	Output Functions
Weight Decay	Regularization
WordPiece	Subword Segmentation

Categories

Add Remove

Transformers

Autoencoding Transformers

ELECTRA

Papers

Tasks

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove