no code implementations • 2 Jun 2020 • Andy Wagner, Tiyasa Mitra, Mrinal Iyer, Godfrey Da Costa, Marc Tremblay
Masked language modeling (MLM) pre-training models such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.