ELECTRA is a transformer with a new pre-training approach which trains two transformer models: the generator and the discriminator. The generator replaces tokens in the sequence - trained as a masked language model - and the discriminator (the ELECTRA contribution) attempts to identify which tokens are replaced by the generator in the sequence. This pre-training task is called replaced token detection, and is a replacement for masking the input.

Source: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Latest Papers

PAPER DATE
Taking Notes on the Fly Helps Language Pre-Training
Anonymous
2021-01-01
A Multi-modal Deep Learning Model for Video Thumbnail Selection
Zhifeng YuNanchun Shi
2020-12-31
DEER: A Data Efficient Language Model for Event Temporal Reasoning
Rujun HanXiang RenNanyun Peng
2020-12-30
Learning to Retrieve Entity-Aware Knowledge and Generate Responses with Copy Mechanism for Task-Oriented Dialogue Systems
Chao-Hong TanXiaoyu YangZi'ou ZhengTianda LiYufei FengJia-Chen GuQuan LiuDan LiuZhen-Hua LingXiaodan Zhu
2020-12-22
Focusing More on Conflicts with Mis-Predictions Helps Language Pre-Training
Chen XingWencong XiaoYong LiWei Lin
2020-12-16
Pre-Training Transformers as Energy-Based Cloze Models
| Kevin ClarkMinh-Thang LuongQuoc V. LeChristopher D. Manning
2020-12-15
A Clarifying Question Selection System from NTES_ALONG in Convai3 Challenge
Wenjie OuYue Lin
2020-10-27
Commonsense knowledge adversarial dataset that challenges ELECTRA
Gongqi LinYuan MiaoXiaoyong YangWenwu OuLizhen CuiWei GuoChunyan Miao
2020-10-25
Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation
Jan Christian Blaise CruzJose Kristian ResabalJames LinDan John VelascoCharibeth Cheng
2020-10-22
German's Next Language Model
Branden ChanStefan SchweterTimo Möller
2020-10-21
Aspect-based Document Similarity for Research Papers
| Malte OstendorffTerry RuasTill BlumeBela GippGeorg Rehm
2020-10-13
Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue
Longxiang LiuZhuosheng ZhangHai ZhaoXi ZhouXiang Zhou
2020-09-14
Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability
Mayank ChhipaHrushikesh Mahesh VazurkarAbhijeet KumarMridul Mishra
2020-09-09
Roles and Utilization of Attention Heads in Transformer-based Neural Language Models
Jae-young JoSung-Hyon Myaeng
2020-07-01
MC-BERT: Efficient Language Pre-Training via a Meta Controller
| Zhenhui XuLinyuan GongGuolin KeDi HeShuxin ZhengLiwei WangJiang BianTie-Yan Liu
2020-06-10
Learning-to-Rank with BERT in TF-Ranking
Shuguang HanXuanhui WangMike BenderskyMarc Najork
2020-04-17
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
| Kevin ClarkMinh-Thang LuongQuoc V. LeChristopher D. Manning
2020-03-23

Categories