TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition	HMDB-51	PERF-Net (distilled S3D-G)	Average accuracy of 3 splits	83.2	# 12
Action Classification	Kinetics-600	PERF-Net (distilled ResNet50-G)	Top-1 Accuracy	82.0	# 45
Action Classification	Kinetics-600	PERF-Net (distilled ResNet50-G)	Top-5 Accuracy	95.7	# 33
Action Recognition	UCF101	PERF-Net (multi-distilled S3D)	3-fold Accuracy	98.6	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/perf-net-pose-empowered-rgb-flow-net/action-recognition-in-videos-on-ucf101)](https://paperswithcode.com/sota/action-recognition-in-videos-on-ucf101?p=perf-net-pose-empowered-rgb-flow-net)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/perf-net-pose-empowered-rgb-flow-net/action-recognition-in-videos-on-hmdb-51)](https://paperswithcode.com/sota/action-recognition-in-videos-on-hmdb-51?p=perf-net-pose-empowered-rgb-flow-net)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/perf-net-pose-empowered-rgb-flow-net/action-classification-on-kinetics-600)](https://paperswithcode.com/sota/action-classification-on-kinetics-600?p=perf-net-pose-empowered-rgb-flow-net)`

PERF-Net: Pose Empowered RGB-Flow Net

28 Sep 2020 · Yinxiao Li, Zhichao Lu, Xuehan Xiong, Jonathan Huang ·

In recent years, many works in the video action recognition literature have shown that two stream models (combining spatial and temporal input streams) are necessary for achieving state of the art performance. In this paper we show the benefits of including yet another stream based on human pose estimated from each frame -- specifically by rendering pose on input RGB frames. At first blush, this additional stream may seem redundant given that human pose is fully determined by RGB pixel values -- however we show (perhaps surprisingly) that this simple and flexible addition can provide complementary gains. Using this insight, we then propose a new model, which we dub PERF-Net (short for Pose Empowered RGB-Flow Net), which combines this new pose stream with the standard RGB and flow based input streams via distillation techniques and show that our model outperforms the state-of-the-art by a large margin in a number of human action recognition datasets while not requiring flow or pose to be explicitly computed at inference time. The proposed pose stream is also part of the winner solution of the ActivityNet Kinetics Challenge 2020.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Action Classification

Action Recognition

Temporal Action Localization

Datasets

ImageNet

MS COCO

UCF101

Kinetics

HMDB51

Kinetics-600

Kinetics-700

Results from the Paper

Edit

Ranked #5 on Action Recognition on UCF101

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition	HMDB-51	PERF-Net (distilled S3D-G)	Average accuracy of 3 splits	83.2	# 12	Compare
Action Classification	Kinetics-600	PERF-Net (distilled ResNet50-G)	Top-1 Accuracy	82.0	# 45	Compare
Action Classification	Kinetics-600	PERF-Net (distilled ResNet50-G)	Top-5 Accuracy	95.7	# 33	Compare
Action Recognition	UCF101	PERF-Net (multi-distilled S3D)	3-fold Accuracy	98.6	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

PERF-Net: Pose Empowered RGB-Flow Net

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove