Zero-Shot Action Recognition

34 papers with code • 7 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Zero-Shot Action Recognition

Dataset	Best Model	Compare
UCF101	OTI(ViT-L/14)	See all
HMDB51	MOV (ViT-L/14)	See all
Kinetics	IMP-MoE-L	See all
Olympics	SPOT	See all
ActivityNet	BIKE	See all
Charades	MSQNet	See all
THUMOS' 14	MSQNet	See all

Libraries

Use these libraries to find Zero-Shot Action Recognition models and implementations

towhee-io/towhee

2 papers

3,033

whwu95/Cap4Video

2 papers

212

Datasets

Most implemented papers

Most implemented Social Latest No code

Tell me what you see: A zero-shot action recognition method based on natural language descriptions

valterlej/zsarcap • 18 Dec 2021

To the best of our knowledge, this is the first work to represent both videos and labels with descriptive sentences.

Paper
Code

End-to-End Semantic Video Transformer for Zero-Shot Action Recognition

secure-and-intelligent-systems-lab/semanticvideotransformer • • 10 Mar 2022

While video action recognition has been an active area of research for several years, zero-shot action recognition has only recently started gaining traction.

Paper
Code

Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions

KevinQian97/JigsawNet • • ECCV 2022

However, due to the complexity of actions, it remains challenging to transfer knowledge learned from source to target action domains.

Paper
Code

Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification

ShipuLoveMili/CVPR2022-AURL • • CVPR 2022

Further, we synthesize features of unseen classes by proposing a class generator that interpolates and extrapolates the features of seen classes.

Paper
Code

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

tencentarc/mcq • • 26 Apr 2022

Dominant pre-training work for video-text retrieval mainly adopt the "dual-encoder" architectures to enable efficient retrieval, where two separate encoders are used to contrast global video and text representations, but ignore detailed local semantics.

Paper
Code

A CLIP-Hitchhiker's Guide to Long Video Retrieval

m-bain/clip-hitchhiker • • 17 May 2022

Our goal in this paper is the adaptation of image-text models for long video retrieval.

Paper
Code

Global Semantic Descriptors for Zero-Shot Action Recognition

valterlej/objsentzsar • • 24 Sep 2022

This work introduces a new ZSAR method based on the relationships of actions-objects and actions-descriptive sentences.

Paper
Code

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

wlin-at/maxi • • ICCV 2023

We adapt a VL model for zero-shot and few-shot action recognition using a collection of unlabeled videos and an unpaired action dictionary.

Paper
Code

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

talalwasim/vita-clip • • CVPR 2023

Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting.

Paper
Code

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

opengvlab/internvideo • • 13 Jul 2023

Specifically, we utilize a multi-scale approach to generate video-related descriptions.

Paper
Code

Zero-Shot Action Recognition

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result