Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Transformer has become ubiquitous in the deep learning field. One of the key ingredients that destined its success is the self-attention mechanism, which allows fully-connected contextual encoding over input tokens... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Language Modelling enwik8 Cluster-Former (#C=512) Bit per Character (BPC) 1.22 # 18
Question Answering Natural Questions (long) Cluster-Former (#C=512) F1 76.5 # 1
Question Answering Natural Questions (short) Cluster-Former (#C=512) F1 57.1 # 2
Question Answering Quasart-T Cluster-Former (#C=512) EM 54 # 1
Open-Domain Question Answering SearchQA Cluster-Former (#C=512) EM 68.0 # 1

Methods used in the Paper