TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Classification	AViD	TokenLearner	Accuracy	53.8	# 1
Action Classification	Charades	TokenLearner	MAP	66.3	# 1
Image Classification	ImageNet	TokenLearner L/8 (24+11)	Top 1 Accuracy	88.87%	# 36
Image Classification	ImageNet	TokenLearner L/8 (24+11)	Number of params	460M	# 932
Image Classification	ImageNet	16-TokenLearner B/16 (21)	Top 1 Accuracy	87.07%	# 110
Image Classification	ImageNet ReaL	TokenLearner L/8 (24+11)	Accuracy	91.05%	# 6
Image Classification	ImageNet ReaL	TokenLearner L/8 (24+11)	Params	460M	# 49
Action Classification	Kinetics-400	TokenLearner 16at18 (L/10)	Acc@1	85.4	# 47
Action Classification	Kinetics-600	TokenLearner 16at18 w. Fuser (L/10)	Top-1 Accuracy	86.3	# 27
Action Classification	Kinetics-600	TokenLearner 16at18 w. Fuser (L/10)	Top-5 Accuracy	97.0	# 19

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tokenlearner-what-can-8-learned-tokens-do-for/action-classification-on-avid)](https://paperswithcode.com/sota/action-classification-on-avid?p=tokenlearner-what-can-8-learned-tokens-do-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tokenlearner-what-can-8-learned-tokens-do-for/action-classification-on-charades)](https://paperswithcode.com/sota/action-classification-on-charades?p=tokenlearner-what-can-8-learned-tokens-do-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tokenlearner-what-can-8-learned-tokens-do-for/image-classification-on-imagenet-real)](https://paperswithcode.com/sota/image-classification-on-imagenet-real?p=tokenlearner-what-can-8-learned-tokens-do-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tokenlearner-what-can-8-learned-tokens-do-for/action-classification-on-kinetics-600)](https://paperswithcode.com/sota/action-classification-on-kinetics-600?p=tokenlearner-what-can-8-learned-tokens-do-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tokenlearner-what-can-8-learned-tokens-do-for/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=tokenlearner-what-can-8-learned-tokens-do-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tokenlearner-what-can-8-learned-tokens-do-for/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=tokenlearner-what-can-8-learned-tokens-do-for)`

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

21 Jun 2021 · Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova ·

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks. Instead of relying on hand-designed splitting strategies to obtain visual tokens and processing a large number of densely sampled patches for attention, our approach learns to mine important tokens in visual data. This results in efficiently and effectively finding a few important visual tokens and enables modeling of pairwise attention between such tokens, over a longer temporal horizon for videos, or the spatial content in images. Our experiments demonstrate strong performance on several challenging benchmarks for both image and video recognition tasks. Importantly, due to our tokens being adaptive, we accomplish competitive results at significantly reduced compute amount. We obtain comparable results to the state-of-the-arts on ImageNet while being computationally more efficient. We also confirm the effectiveness of the approach on multiple video datasets, including Kinetics-400, Kinetics-600, Charades, and AViD. The code is available at: https://github.com/google-research/scenic/tree/main/scenic/projects/token_learner

PDF Abstract

Code

Add Remove Mark official

google-research/scenic official

↳ Quickstart in

Colab

3,000

keras-team/keras-io

2,646

rish-16/tokenlearner-pytorch

ariG23498/TokenLearner

↳ Quickstart in

Colab

Tasks

Add Remove

Action Classification

Image Classification

Representation Learning

Video Recognition

Video Understanding

Datasets

ImageNet

Kinetics

Kinetics 400

Charades

Kinetics-600

JFT-300M AViD

Results from the Paper

Edit

Ranked #1 on Action Classification on Charades

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Classification	AViD	TokenLearner	Accuracy	53.8	# 1	Compare
Action Classification	Charades	TokenLearner	MAP	66.3	# 1	Compare
Image Classification	ImageNet	TokenLearner L/8 (24+11)	Top 1 Accuracy	88.87%	# 36	Compare
Image Classification	ImageNet	TokenLearner L/8 (24+11)	Number of params	460M	# 932	Compare
Image Classification	ImageNet	16-TokenLearner B/16 (21)	Top 1 Accuracy	87.07%	# 110	Compare
Image Classification	ImageNet ReaL	TokenLearner L/8 (24+11)	Accuracy	91.05%	# 6	Compare
Image Classification	ImageNet ReaL	TokenLearner L/8 (24+11)	Params	460M	# 49	Compare
Action Classification	Kinetics-400	TokenLearner 16at18 (L/10)	Acc@1	85.4	# 47	Compare
Action Classification	Kinetics-600	TokenLearner 16at18 w. Fuser (L/10)	Top-1 Accuracy	86.3	# 27	Compare
Action Classification	Kinetics-600	TokenLearner 16at18 w. Fuser (L/10)	Top-5 Accuracy	97.0	# 19	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove