Action Detection

233 papers with code • 11 benchmarks • 33 datasets

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Detection

Dataset	Best Model	Compare
J-HMDB	HIT	See all
Charades	TTM	See all
UCF101-24	STAR/L	See all
Multi-THUMOS	MLAD	See all
UCF Sports	T-CNN	See all
THUMOS' 14	MAT (Ours) Trans	See all
TSU	PDAN	See all
TTStroke-21 ME22	STCNN-V2 (Vote decision)	See all
TTStroke-21 ME21	STCNN	See all
MultiSports	HIT	See all
MultiTHUMOS	PAT	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Detection models and implementations

open-mmlab/mmaction2

6 papers

3,906

alibaba-damo-academy/FunASR

3 papers

3,378

Frostinassiky/gtad

3 papers

216

towhee-io/towhee

2 papers

3,001

See all 6 libraries.

Datasets

Subtasks

Audio-Visual Active Speaker Detection

Fine-Grained Action Detection

Action Triplet Detection

Few Shot Temporal Action Localization

Multiple Action Detection

Most implemented papers

Most implemented Social Latest No code

Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization

butspeechfit/eend • • 12 Nov 2022

End-to-end diarization presents an attractive alternative to standard cascaded diarization systems because a single system can handle all aspects of the task at once.

Paper
Code

Temporal Action Localization with Enhanced Instant Discriminability

dingfengshi/tridetplus • • 11 Sep 2023

Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.

Paper
Code

Single Shot Temporal Action Detection

hypjudy/Decouple-SSAD • • 17 Oct 2017

The main drawback of this framework is that the boundaries of action instance proposals have been fixed during the classification step.

Paper
Code

Learning Latent Super-Events to Detect Multiple Activities in Videos

piergiaj/super-events-cvpr18 • • CVPR 2018

In this paper, we introduce the concept of learning latent super-events from activity videos, and present how it benefits activity detection in continuous videos.

Paper
Code

SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

SilvioGiancola/SoccerNet-code • • 12 Apr 2018

A total of 6, 637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution).

Paper
Code

Temporal Recurrent Networks for Online Action Detection

xumingze0308/TRN.pytorch • • ICCV 2019

Most work on temporal action detection is formulated as an offline problem, in which the start and end times of actions are determined after the entire video is fully observed.

Paper
Code