ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ICLR 2020 Kevin ClarkMinh-Thang LuongQuoc V. LeChristopher D. Manning

Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to downstream NLP tasks, they generally require large amounts of compute to be effective... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper