BP-Transformer: Modelling Long-Range Context via Binary Partitioning

11 Nov 2019Zihao YeQipeng GuoQuan GanXipeng QiuZheng Zhang

The Transformer model is widely successful on many natural language processing tasks. However, the quadratic complexity of self-attention limit its application on long text... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK BENCHMARK
Language Modelling enwik8 BP-Transformer - 12 Layers Bit per Character (BPC) 1.02 # 9
Sentiment Analysis IMDb BP-Transformer + GloVe Accuracy 92.12 # 11
Machine Translation IWSLT2015 Chinese-English BP-Transformer BLEU 19.84 # 1
Sentiment Analysis SST-5 Fine-grained classification BP-Transformer + GloVe Accuracy 52.71 # 9
Language Modelling Text8 BP-Transformer - 12 Layers Bit per Character (BPC) 1.11 # 5

Methods used in the Paper