Syntactic Structure Distillation Pretraining For Bidirectional Encoders

27 May 2020Adhiguna KuncoroLingpeng KongDaniel FriedDani YogatamaLaura RimellChris DyerPhil Blunsom

Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence. Given this success, it remains an open question whether scalable learners like BERT can become fully proficient in the syntax of natural language by virtue of data scale alone, or whether they still benefit from more explicit syntactic biases... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper