On the Transformer Growth for Progressive BERT Training

23 Oct 2020 Xiaotao Gu Liyuan Liu Hongkun Yu Jing Li Chen Chen Jiawei Han

As the excessive pre-training cost arouses the need to improve efficiency, considerable efforts have been made to train BERT progressively--start from an inferior but low-cost model and gradually increase the computational complexity. Our objective is to help advance the understanding of such Transformer growth and discover principles that guide progressive training... (read more)

PDF Abstract
No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper