The Lottery Ticket Hypothesis for Pre-trained BERT Networks

23 Jul 2020Tianlong ChenJonathan FrankleShiyu ChangSijia LiuYang ZhangZhangyang WangMichael Carbin

In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training on a range of downstream tasks, and similar trends are emerging in other areas of deep learning. In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matching subnetworks capable of training in isolation to full accuracy and transferring to other tasks... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper