Action Detection

233 papers with code • 11 benchmarks • 33 datasets

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Detection

Dataset	Best Model	Compare
J-HMDB	HIT	See all
Charades	TTM	See all
UCF101-24	STAR/L	See all
Multi-THUMOS	MLAD	See all
UCF Sports	T-CNN	See all
THUMOS' 14	MAT (Ours) Trans	See all
TSU	PDAN	See all
TTStroke-21 ME22	STCNN-V2 (Vote decision)	See all
TTStroke-21 ME21	STCNN	See all
MultiSports	HIT	See all
MultiTHUMOS	PAT	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Detection models and implementations

open-mmlab/mmaction2

6 papers

3,876

alibaba-damo-academy/FunASR

3 papers

3,115

Frostinassiky/gtad

3 papers

216

towhee-io/towhee

2 papers

2,972

See all 6 libraries.

Datasets

Subtasks

Audio-Visual Active Speaker Detection

Fine-Grained Action Detection

Action Triplet Detection

Few Shot Temporal Action Localization

Multiple Action Detection

Most implemented papers

Most implemented Social Latest No code

From Recognition to Prediction: Analysis of Human Action and Trajectory Prediction in Video

JunweiLiang/Multiverse • • 20 Nov 2020

With the advancement in computer vision deep learning, systems now are able to analyze an unprecedented amount of rich visual information from videos to enable applications such as autonomous driving, socially-aware robot assistant and public safety monitoring.

Paper
Code

Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

imatge-upc/activitynet-2016-cvprw • • 29 Aug 2016

This thesis explore different approaches using Convolutional and Recurrent Neural Networks to classify and temporally localize activities on videos, furthermore an implementation to achieve it has been proposed.

Paper
Code

An End-to-End Architecture for Keyword Spotting and Voice Activity Detection

mindorii/kws • • 28 Nov 2016

We propose a single neural network architecture for two tasks: on-line keyword spotting and voice activity detection.

Paper
Code

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

VisionLearningGroup/R-C3D • ICCV 2017

We address the problem of activity detection in continuous, untrimmed video streams.

Paper
Code

Fine-grained Activity Recognition in Baseball Videos

piergiaj/mlb-youtube • • 9 Apr 2018

In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection.

Paper
Code

rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method

zhenghuatan/rVAD • 9 Jun 2019

In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity.

Paper
Code

pyannote.audio: neural building blocks for speaker diarization

pyannote/pyannote-audio • • 4 Nov 2019

We introduce pyannote. audio, an open-source toolkit written in Python for speaker diarization.

Paper
Code

A Multigrid Method for Efficiently Training Video Models

facebookresearch/SlowFast • • CVPR 2020

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

Paper
Code

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Siyu-C/ACAR-Net • • CVPR 2021

We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.

Paper
Code

Context-Aware RCNN: A Baseline for Action Detection in Videos

MCG-NJU/CRCNN-Action • • ECCV 2020

In this work, we first empirically find the recognition accuracy is highly correlated with the bounding box size of an actor, and thus higher resolution of actors contributes to better performance.

Paper
Code

Action Detection

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result