DynaBERT: Dynamic BERT with Adaptive Width and Depth

The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment... (read more)

PDF Abstract NeurIPS 2020 PDF NeurIPS 2020 Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper