Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

5 Jun 2020Zihang DaiGuokun LaiYiming YangQuoc V. Le

With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only require a single-vector presentation of the sequence... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Reading Comprehension RACE B10-10-10 Accuracy 85.7 # 4
Accuracy (High) 84.4 # 4
Accuracy (Middle) 88.8 # 2

Methods used in the Paper