TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Self-Supervised Audio Classification	AudioSet (MLP)	BraVe:V-FA (TSM-50x2)	Top-1 Accuracy	34.8	# 1
Self-Supervised Audio Classification	ESC-50	BraVe:V-FA (TSM-50x2)	Top-1 Accuracy	91.1	# 1
Self-Supervised Action Recognition	HMDB51	BraVe:V-FA (TSM-50x2)	Top-1 Accuracy	70.5	# 6
Self-Supervised Action Recognition	HMDB51	BraVe:V-FA (TSM-50x2)	Frozen	false	# 1
Self-Supervised Action Recognition	HMDB51 (finetuned)	BraVe:V-FA (TSM-50x2)	Top-1 Accuracy	77.8	# 1
Self-Supervised Action Recognition	Kinetics-600	BraVe:V-FA (TSM-50x2)	Top-1 Accuracy	71.4	# 3
Self-Supervised Action Recognition	UCF101	BraVe:V-FA (TSM-50x2)	3-fold Accuracy	93.1	# 15
Self-Supervised Action Recognition	UCF101	BraVe:V-FA (TSM-50x2)	Frozen	false	# 1
Self-Supervised Action Recognition	UCF101 (finetuned)	BraVe:V-FA (TSM-50x2)	3-fold Accuracy	95.7	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/broaden-your-views-for-self-supervised-video/self-supervised-audio-classification-on)](https://paperswithcode.com/sota/self-supervised-audio-classification-on?p=broaden-your-views-for-self-supervised-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/broaden-your-views-for-self-supervised-video/self-supervised-audio-classification-on-esc)](https://paperswithcode.com/sota/self-supervised-audio-classification-on-esc?p=broaden-your-views-for-self-supervised-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/broaden-your-views-for-self-supervised-video/self-supervised-action-recognition-on-hmdb51-1)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-hmdb51-1?p=broaden-your-views-for-self-supervised-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/broaden-your-views-for-self-supervised-video/self-supervised-action-recognition-on-ucf101-1)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-ucf101-1?p=broaden-your-views-for-self-supervised-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/broaden-your-views-for-self-supervised-video/self-supervised-action-recognition-on)](https://paperswithcode.com/sota/self-supervised-action-recognition-on?p=broaden-your-views-for-self-supervised-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/broaden-your-views-for-self-supervised-video/self-supervised-action-recognition-on-hmdb51)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-hmdb51?p=broaden-your-views-for-self-supervised-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/broaden-your-views-for-self-supervised-video/self-supervised-action-recognition-on-ucf101)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-ucf101?p=broaden-your-views-for-self-supervised-video)`

Broaden Your Views for Self-Supervised Video Learning

ICCV 2021 · Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Ross Hemsley, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altché, Michal Valko, Jean-bastien Grill, Aäron van den Oord, Andrew Zisserman ·

Most successful self-supervised learning methods are trained to align the representations of two independent views from the data. State-of-the-art methods in video are inspired by image techniques, where these two views are similarly extracted by cropping and augmenting the resulting crop. However, these methods miss a crucial element in the video domain: time. We introduce BraVe, a self-supervised learning framework for video. In BraVe, one of the views has access to a narrow temporal window of the video while the other view has a broad access to the video content. Our models learn to generalise from the narrow view to the general content of the video. Furthermore, BraVe processes the views with different backbones, enabling the use of alternative augmentations or modalities into the broad view such as optical flow, randomly convolved RGB frames, audio or their combinations. We demonstrate that BraVe achieves state-of-the-art results in self-supervised representation learning on standard video and audio classification benchmarks including UCF101, HMDB51, Kinetics, ESC-50 and AudioSet.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Code

Add Remove Mark official

deepmind/brave official

Tasks

Add Remove

Audio Classification

Optical Flow Estimation

Representation Learning

Self-Supervised Action Recognition

Self-Supervised Audio Classification

Self-Supervised Learning

Datasets

UCF101

Kinetics

HMDB51

Kinetics 400

AudioSet

ESC-50

Kinetics-600

Results from the Paper

Edit

Ranked #1 on Self-Supervised Action Recognition on HMDB51 (finetuned)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Self-Supervised Audio Classification	AudioSet (MLP)	BraVe:V-FA (TSM-50x2)	Top-1 Accuracy	34.8	# 1	Compare
Self-Supervised Audio Classification	ESC-50	BraVe:V-FA (TSM-50x2)	Top-1 Accuracy	91.1	# 1	Compare
Self-Supervised Action Recognition	HMDB51	BraVe:V-FA (TSM-50x2)	Top-1 Accuracy	70.5	# 6	Compare
Self-Supervised Action Recognition	HMDB51	BraVe:V-FA (TSM-50x2)	Frozen	false	# 1	Compare
Self-Supervised Action Recognition	HMDB51 (finetuned)	BraVe:V-FA (TSM-50x2)	Top-1 Accuracy	77.8	# 1	Compare
Self-Supervised Action Recognition	Kinetics-600	BraVe:V-FA (TSM-50x2)	Top-1 Accuracy	71.4	# 3	Compare
Self-Supervised Action Recognition	UCF101	BraVe:V-FA (TSM-50x2)	3-fold Accuracy	93.1	# 15	Compare
Self-Supervised Action Recognition	UCF101	BraVe:V-FA (TSM-50x2)	Frozen	false	# 1	Compare
Self-Supervised Action Recognition	UCF101 (finetuned)	BraVe:V-FA (TSM-50x2)	3-fold Accuracy	95.7	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Broaden Your Views for Self-Supervised Video Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove