ConvBERT: Improving BERT with Span-based Dynamic Convolution

6 Aug 2020Zihang JiangWeihao YuDaquan ZhouYunpeng ChenJiashi FengShuicheng Yan

Pre-trained language models like BERT and its variants have recently achieved impressive performance in various natural language understanding tasks. However, BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost... (read more)

PDF Abstract

Code


No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper