Blockwise Self-Attention for Long Document Understanding

7 Nov 2019 Jiezhong Qiu Hao Ma Omer Levy Scott Wen-tau Yih Sinong Wang Jie Tang

We present BlockBERT, a lightweight and efficient BERT model for better modeling long-distance dependencies. Our model extends BERT by introducing sparse block structures into the attention matrix to reduce both memory consumption and training/inference time, which also enables attention heads to capture either short- or long-range contextual information... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper