TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-range modeling	SCROLLS	CoLT5 XL	GovRep	61.3/32.2/33.8	# 11
Long-range modeling	SCROLLS	CoLT5 XL	SumScr	36.4/10.2/21.7	# 12
Long-range modeling	SCROLLS	CoLT5 XL	QMSum	36.2/12.9/24.3	# 12
Long-range modeling	SCROLLS	CoLT5 XL	Qspr	53.9	# 1
Long-range modeling	SCROLLS	CoLT5 XL	Nrtv	31.1	# 1
Long-range modeling	SCROLLS	CoLT5 XL	QALT EM-T/H	48.1/43.8	# 10
Long-range modeling	SCROLLS	CoLT5 XL	CNLI	88.4	# 2
Long-range modeling	SCROLLS	CoLT5 XL	Avg.	43.51	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/colt5-faster-long-range-transformers-with/long-range-modeling-on-scrolls)](https://paperswithcode.com/sota/long-range-modeling-on-scrolls?p=colt5-faster-long-range-transformers-with)`

CoLT5: Faster Long-Range Transformers with Conditional Computation

17 Mar 2023 · Joshua Ainslie, Tao Lei, Michiel de Jong, Santiago Ontañón, Siddhartha Brahma, Yury Zemlyanskiy, David Uthus, Mandy Guo, James Lee-Thorp, Yi Tay, Yun-Hsuan Sung, Sumit Sanghai ·

Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Long-range modeling

Datasets

Natural Questions

TriviaQA

NarrativeQA GovReport

QuALITY

QASPER SummScreen

SCROLLS ContractNLI

Results from the Paper

Add Remove

Ranked #1 on Long-range modeling on SCROLLS

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-range modeling	SCROLLS	CoLT5 XL	GovRep	61.3/32.2/33.8	# 11	Compare
			SumScr	36.4/10.2/21.7	# 12	Compare
			QMSum	36.2/12.9/24.3	# 12	Compare
			Qspr	53.9	# 1	Compare
			Nrtv	31.1	# 1	Compare
			QALT EM-T/H	48.1/43.8	# 10	Compare
			CNLI	88.4	# 2	Compare
			Avg.	43.51	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

CoLT5: Faster Long-Range Transformers with Conditional Computation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove