Action Detection

235 papers with code • 11 benchmarks • 33 datasets

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Detection

Dataset	Best Model	Compare
J-HMDB	HIT	See all
Charades	TTM	See all
UCF101-24	STAR/L	See all
Multi-THUMOS	MLAD	See all
UCF Sports	T-CNN	See all
THUMOS' 14	MAT (Ours) Trans	See all
TSU	PDAN	See all
TTStroke-21 ME22	STCNN-V2 (Vote decision)	See all
TTStroke-21 ME21	STCNN	See all
MultiSports	HIT	See all
MultiTHUMOS	PAT	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Detection models and implementations

open-mmlab/mmaction2

6 papers

3,912

alibaba-damo-academy/FunASR

3 papers

3,393

Frostinassiky/gtad

3 papers

216

towhee-io/towhee

2 papers

3,001

See all 6 libraries.

Datasets

Subtasks

Audio-Visual Active Speaker Detection

Fine-Grained Action Detection

Action Triplet Detection

Few Shot Temporal Action Localization

Multiple Action Detection

Latest papers

Most implemented Social Latest No code

Centre Stage: Centricity-based Audio-Visual Temporal Action Detection

hanielwang/audio-visual-tad • • 28 Nov 2023

Previous one-stage action detection approaches have modelled temporal dependencies using only the visual modality.

28 Nov 2023

Paper
Code

ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors

shirleymaxx/chimpact • • NeurIPS 2023

ChimpACT is both comprehensive and challenging, consisting of 163 videos with a cumulative 160, 500 frames, each richly annotated with detection, identification, pose estimation, and fine-grained spatiotemporal behavior labels.

25 Oct 2023

Paper
Code

Boundary Discretization and Reliable Classification Network for Temporal Action Detection

zhenyingfang/BDRC-Net • • 10 Oct 2023

Specifically, the boundary discretization module (BDM) elegantly merges anchor-based and anchor-free approaches in the form of boundary discretization, avoiding the handcrafted anchors design required by traditional mixed methods.

10 Oct 2023

Paper
Code

ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios

syscv/sam-hq • • 26 Sep 2023

ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e. g., electric screwdriver) and equipments (e. g., oscilloscope).

3,374

26 Sep 2023

Paper
Code

Temporal Action Localization with Enhanced Instant Discriminability

sssste/tridet • • 11 Sep 2023

Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.

150

11 Sep 2023

Paper
Code

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

juliendenize/eztorch • • 3 Sep 2023

We present COMEDIAN, a novel pipeline to initialize spatiotemporal transformers for action spotting, which involves self-supervised learning and knowledge distillation.

03 Sep 2023

Paper
Code

Progression-Guided Temporal Action Detection in Videos

makecent/apn • • 18 Aug 2023

The framework locates actions in videos by detecting the action evolution process.

18 Aug 2023

Paper
Code

Memory-and-Anticipation Transformer for Online Action Understanding

echo0125/memory-and-anticipation-transformer • • ICCV 2023

Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks.

15 Aug 2023

Paper
Code

Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations

w-wu/steer • • 14 Aug 2023

Two metrics are proposed to evaluate AER performance with automatic segmentation based on time-weighted emotion and speaker classification errors.

14 Aug 2023

Paper
Code

ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

yairl/ivrit.ai • • 17 Jul 2023

We introduce "ivrit. ai", a comprehensive Hebrew speech dataset, addressing the distinct lack of extensive, high-quality resources for advancing Automated Speech Recognition (ASR) technology in Hebrew.

17 Jul 2023

Paper
Code

Action Detection

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result