no code implementations • 29 Mar 2023 • Zhiyi Li, Douglas Orr, Valeriu Ohan, Godfrey Da Costa, Tom Murray, Adam Sanders, Deniz Beker, Dominic Masters
Furthermore, static sparsity in general outperforms dynamic sparsity.
no code implementations • 2 Jun 2020 • Andy Wagner, Tiyasa Mitra, Mrinal Iyer, Godfrey Da Costa, Marc Tremblay
Masked language modeling (MLM) pre-training models such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.