TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Classification	Kinetics-400	RNL+TSM Ensemble(ResNet50, 8 + 16 frames)	Acc@1	77.4	# 132
Action Recognition	Something-Something V1	RNL+TSM Ensemble(R50+R101, ImageNet pretrained)	Top 1 Accuracy	54.1	# 32
Action Recognition	Something-Something V1	RNL+TSM Ensemble(R50+R101, ImageNet pretrained)	Top 5 Accuracy	82.2	# 20
Action Recognition	Something-Something V1	RNL+TSM Ensemble(ResNet50, ImageNet pretrained)	Top 1 Accuracy	52.7	# 39
Action Recognition	Something-Something V1	RNL+TSM Ensemble(ResNet50, ImageNet pretrained)	Top 5 Accuracy	81.5	# 21

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/region-based-non-local-operation-for-video/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=region-based-non-local-operation-for-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/region-based-non-local-operation-for-video/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=region-based-non-local-operation-for-video)`

Region-based Non-local Operation for Video Classification

17 Jul 2020 · Guoxi Huang, Adrian G. Bors ·

Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult. This paper presents region-based non-local (RNL) operations as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations. Given an intermediate feature map, our method recalibrates the feature at a position by aggregating the information from the neighboring regions of all positions. By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training. We evaluate our method on two video classification benchmarks. The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the Something-Something V1 dataset.

PDF Abstract

Code

Add Remove Mark official

guoxih/region-based-non-local-netwo… official

Tasks

Add Remove

Action Classification

Action Recognition

Action Recognition In Videos

Classification

General Classification

Position

Video Classification

Datasets

Kinetics

Kinetics 400

Something-Something V1

Results from the Paper

Edit

Ranked #32 on Action Recognition on Something-Something V1

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Classification	Kinetics-400	RNL+TSM Ensemble(ResNet50, 8 + 16 frames)	Acc@1	77.4	# 132	Compare
Action Recognition	Something-Something V1	RNL+TSM Ensemble(R50+R101, ImageNet pretrained)	Top 1 Accuracy	54.1	# 32	Compare
Action Recognition	Something-Something V1	RNL+TSM Ensemble(R50+R101, ImageNet pretrained)	Top 5 Accuracy	82.2	# 20	Compare
Action Recognition	Something-Something V1	RNL+TSM Ensemble(ResNet50, ImageNet pretrained)	Top 1 Accuracy	52.7	# 39	Compare
Action Recognition	Something-Something V1	RNL+TSM Ensemble(ResNet50, ImageNet pretrained)	Top 5 Accuracy	81.5	# 21	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Channel Attention Module • Convolution • Dense Connections • Max Pooling • Non-Local Operation • ReLU • Sigmoid Activation

Edit Social Preview

Region-based Non-local Operation for Video Classification

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove