TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	EXTRA DATA	REMOVE
Action Classification	Kinetics-400	GB + DF + LB (ResNet 152, ImageNet pretrained)	Acc@1	78.8	# 116
Action Recognition	Something-Something V1	GB + DF + LB (ResNet152, ImageNet pretrained)	Top 1 Accuracy	53.4	# 36

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/190807625/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=190807625)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/190807625/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=190807625)`

Action recognition with spatial-temporal discriminative filter banks

ICCV 2019 · Brais Martinez, Davide Modolo, Yuanjun Xiong, Joseph Tighe ·

Action recognition has seen a dramatic performance improvement in the last few years. Most of the current state-of-the-art literature either aims at improving performance through changes to the backbone CNN network, or they explore different trade-offs between computational efficiency and performance, again through altering the backbone network. However, almost all of these works maintain the same last layers of the network, which simply consist of a global average pooling followed by a fully connected layer. In this work we focus on how to improve the representation capacity of the network, but rather than altering the backbone, we focus on improving the last layers of the network, where changes have low impact in terms of computational cost. In particular, we show that current architectures have poor sensitivity to finer details and we exploit recent advances in the fine-grained recognition literature to improve our model in this aspect. With the proposed approach, we obtain state-of-the-art performance on Kinetics-400 and Something-Something-V1, the two major large-scale action recognition benchmarks.

PDF Abstract ICCV 2019 PDF ICCV 2019 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Action Classification

Action Recognition

Computational Efficiency

Datasets

ImageNet

Kinetics

Kinetics 400

Something-Something V1

Results from the Paper

Edit

Ranked #36 on Action Recognition on Something-Something V1 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Uses Extra Training Data	Result	Benchmark
Action Classification	Kinetics-400	GB + DF + LB (ResNet 152, ImageNet pretrained)	Acc@1	78.8	# 116			Compare
Action Recognition	Something-Something V1	GB + DF + LB (ResNet152, ImageNet pretrained)	Top 1 Accuracy	53.4	# 36			Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Global Average Pooling • Kaiming Initialization • Max Pooling • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

Action recognition with spatial-temporal discriminative filter banks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove