Action Detection

235 papers with code • 11 benchmarks • 33 datasets

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Detection

Dataset	Best Model	Compare
J-HMDB	HIT	See all
Charades	TTM	See all
UCF101-24	STAR/L	See all
Multi-THUMOS	MLAD	See all
UCF Sports	T-CNN	See all
THUMOS' 14	MAT (Ours) Trans	See all
TSU	PDAN	See all
TTStroke-21 ME22	STCNN-V2 (Vote decision)	See all
TTStroke-21 ME21	STCNN	See all
MultiSports	HIT	See all
MultiTHUMOS	PAT	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Detection models and implementations

open-mmlab/mmaction2

6 papers

3,916

alibaba-damo-academy/FunASR

3 papers

3,442

Frostinassiky/gtad

3 papers

216

towhee-io/towhee

2 papers

3,005

See all 6 libraries.

Datasets

Subtasks

Audio-Visual Active Speaker Detection

Fine-Grained Action Detection

Action Triplet Detection

Few Shot Temporal Action Localization

Multiple Action Detection

Most implemented papers

Most implemented Social Latest No code

Actions as Moving Points

MCG-NJU/MOC-Detector • • ECCV 2020

The existing action tubelet detectors often depend on heuristic anchor design and placement, which might be computationally expensive and sub-optimal for precise localization.

Paper
Code

Harvesting Ambient RF for Presence Detection Through Deep Learning

bigtreeyanger/presence_detection_cnn • • 13 Feb 2020

With presence detection, how to collect training data with human presence can have a significant impact on the performance.

Paper
Code

PaStaNet: Toward Human Activity Knowledge Engine

DirtyHarryLYL/HAKE • • CVPR 2020

In light of this, we propose a new path: infer human part states first and then reason out the activities based on part-level semantics.

Paper
Code

Asynchronous Interaction Aggregation for Action Detection

MVIG-SJTU/AlphAction • • ECCV 2020

We propose the Asynchronous Interaction Aggregation network (AIA) that leverages different interactions to boost action detection.

Paper
Code

VoxLingua107: a Dataset for Spoken Language Recognition

alumae/torch-xvectors-wav • • 25 Nov 2020

Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.

Paper
Code

Generic Event Boundary Detection: A Benchmark for Event Segmentation

StanLei52/GEBD • • ICCV 2021

This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.

Paper
Code

Relaxed Transformer Decoders for Direct Action Proposal Generation

MCG-NJU/RTD-Action • • ICCV 2021

Extensive experiments on THUMOS14 and ActivityNet-1. 3 benchmarks demonstrate the effectiveness of RTD-Net, on both tasks of temporal action proposal generation and temporal action detection.

Paper
Code

ROAD: The ROad event Awareness Dataset for Autonomous Driving

gurkirt/road-dataset • 23 Feb 2021

We also report the performance on the ROAD tasks of Slowfast and YOLOv5 detectors, as well as that of the winners of the ICCV2021 ROAD challenge, which highlight the challenges faced by situation awareness in autonomous driving.

Paper
Code

End-to-end speaker segmentation for overlap-aware resegmentation

pyannote/segmentation • 8 Apr 2021

Experiments on multiple speaker diarization datasets conclude that our model can be used with great success on both voice activity detection and overlapped speech detection.

Paper
Code

Long Short-Term Transformer for Online Action Detection

amazon-research/long-short-term-transformer • • NeurIPS 2021

We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.

Paper
Code

Action Detection

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result