Position Masking for Language Models

Masked language modeling (MLM) pre-training models such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. This is an effective technique which has led to good results on all NLP benchmarks... (read more)

Results in Papers With Code
(↓ scroll down to see all results)