TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition In Videos	FS-Something-Something V2-Full	OTAM[3]++	Top-1 Accuracy(5-Way-1-Shot)	42.8	# 2
Action Recognition In Videos	FS-Something-Something V2-Full	OTAM[3]++	Top-1 Accuracy(5-Way-5-Shot)	52.3	# 2
Action Recognition In Videos	FS-Something-Something V2-Full	ITANet	Top-1 Accuracy(5-Way-1-Shot)	49.2	# 1
Action Recognition In Videos	FS-Something-Something V2-Full	ITANet	Top-1 Accuracy(5-Way-5-Shot)	62.3	# 1
Action Recognition In Videos	FS-Something-Something V2-Small	ITANet	Top-1 Accuracy(5-Way-1-Shot)	39.8	# 1
Action Recognition In Videos	FS-Something-Something V2-Small	ITANet	Top-1 Accuracy(5-Way-5-Shot)	53.7	# 1
Action Recognition In Videos	FS-Something-Something V2-Small	CMN[35]	Top-1 Accuracy(5-Way-1-Shot)	36.2	# 2
Action Recognition In Videos	FS-Something-Something V2-Small	CMN[35]	Top-1 Accuracy(5-Way-5-Shot)	48.8	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-implicit-temporal-alignment-for-few/action-recognition-in-videos-on-fs-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-fs-something-1?p=learning-implicit-temporal-alignment-for-few)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-implicit-temporal-alignment-for-few/action-recognition-in-videos-on-fs-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-fs-something?p=learning-implicit-temporal-alignment-for-few)`

Learning Implicit Temporal Alignment for Few-shot Video Classification

11 May 2021 · Songyang Zhang, Jiale Zhou, Xuming He ·

Few-shot video classification aims to learn new video categories with only a few labeled examples, alleviating the burden of costly annotation in real-world applications. However, it is particularly challenging to learn a class-invariant spatial-temporal representation in such a setting. To address this, we propose a novel matching-based few-shot learning strategy for video sequences in this work. Our main idea is to introduce an implicit temporal alignment for a video pair, capable of estimating the similarity between them in an accurate and robust manner. Moreover, we design an effective context encoding module to incorporate spatial and feature channel context, resulting in better modeling of intra-class variations. To train our model, we develop a multi-task loss for learning video matching, leading to video features with better generalization. Extensive experimental results on two challenging benchmarks, show that our method outperforms the prior arts with a sizable margin on SomethingSomething-V2 and competitive results on Kinetics.

PDF Abstract

Code

Add Remove Mark official

tonysy/PyAction official

Tasks

Add Remove

Action Recognition In Videos

Classification

Few-Shot Learning

Video Classification

Datasets

Something-Something V2

Results from the Paper

Edit

Ranked #1 on Action Recognition In Videos on FS-Something-Something V2-Small

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition In Videos	FS-Something-Something V2-Full	OTAM[3]++	Top-1 Accuracy(5-Way-1-Shot)	42.8	# 2	Compare
Action Recognition In Videos	FS-Something-Something V2-Full	OTAM[3]++	Top-1 Accuracy(5-Way-5-Shot)	52.3	# 2	Compare
Action Recognition In Videos	FS-Something-Something V2-Full	ITANet	Top-1 Accuracy(5-Way-1-Shot)	49.2	# 1	Compare
Action Recognition In Videos	FS-Something-Something V2-Full	ITANet	Top-1 Accuracy(5-Way-5-Shot)	62.3	# 1	Compare
Action Recognition In Videos	FS-Something-Something V2-Small	ITANet	Top-1 Accuracy(5-Way-1-Shot)	39.8	# 1	Compare
Action Recognition In Videos	FS-Something-Something V2-Small	ITANet	Top-1 Accuracy(5-Way-5-Shot)	53.7	# 1	Compare
Action Recognition In Videos	FS-Something-Something V2-Small	CMN[35]	Top-1 Accuracy(5-Way-1-Shot)	36.2	# 2	Compare
Action Recognition In Videos	FS-Something-Something V2-Small	CMN[35]	Top-1 Accuracy(5-Way-5-Shot)	48.8	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Learning Implicit Temporal Alignment for Few-shot Video Classification

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove