TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-range modeling	LRA	SeqBoat	ListOps	61.70	# 4
Long-range modeling	LRA	SeqBoat	Text	89.60	# 3
Long-range modeling	LRA	SeqBoat	Retrieval	91.28	# 2
Long-range modeling	LRA	SeqBoat	Image	90.10	# 2
Long-range modeling	LRA	SeqBoat	Pathfinder	96.35	# 2
Long-range modeling	LRA	SeqBoat	Avg	87.62	# 2
Long-range modeling	LRA	SeqBoat	Pathfinder-X	96.68	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparse-modular-activation-for-efficient-1/long-range-modeling-on-lra)](https://paperswithcode.com/sota/long-range-modeling-on-lra?p=sparse-modular-activation-for-efficient-1)`

Sparse Modular Activation for Efficient Sequence Modeling

NeurIPS 2023 · Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu, ChengXiang Zhai ·

Recent hybrid models combining Linear State Space Models (SSMs) with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. However, current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. To address this limitation, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption of neural networks at both training and inference stages. To validate the effectiveness of SMA on sequence modeling, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM. By constraining the GAU to only conduct local attention on the activated inputs, SeqBoat can achieve linear inference complexity with theoretically infinite attention span, and provide substantially better quality-efficiency trade-off than the chunking-based models. With experiments on a wide range of tasks, including long sequence modeling, speech classification and language modeling, SeqBoat brings new state-of-the-art results among hybrid models with linear complexity, and reveals the amount of attention needed for each task through the learned sparse activation patterns. Our code is publicly available at https://github.com/renll/SeqBoat.

PDF Abstract NeurIPS 2023 PDF NeurIPS 2023 Abstract

Code

Add Remove Mark official

renll/seqboat official

Tasks

Add Remove

Chunking

Language Modelling

Long-range modeling

Datasets

Speech Commands LRA

ListOps

Results from the Paper

Add Remove

Ranked #2 on Long-range modeling on LRA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-range modeling	LRA	SeqBoat	ListOps	61.70	# 4	Compare
			Text	89.60	# 3	Compare
			Retrieval	91.28	# 2	Compare
			Image	90.10	# 2	Compare
			Pathfinder	96.35	# 2	Compare
			Avg	87.62	# 2	Compare
			Pathfinder-X	96.68	# 5	Compare

Methods

Add Remove

SMA

Edit Social Preview

Sparse Modular Activation for Efficient Sequence Modeling

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove