Temporal Action Localization

422 papers with code • 14 benchmarks • 42 datasets

Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.

Benchmarks

Add a Result

These leaderboards are used to track progress in Temporal Action Localization

Dataset	Best Model	Compare
THUMOS’14	AdaTAD (VideoMAEv2-giant)	See all
ActivityNet-1.3	ActionMamba (InternVideo2-6B)	See all
HACS	ActionMamba(InternVideo2-6B)	See all
CrossTask	VideoCLIP	See all
MultiTHUMOS	TriDet (VideoMAEv2)	See all
FineAction	ActionMamba(InternVideo2-6B)	See all
EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	See all
MUSES	TemporalMaxer	See all
MEXaction2	S-CNN	See all
ActivityNet-1.2	DeepMetricLearner	See all
THUMOS'14	AdaTAD (VideoMAEv2-giant)	See all
Ego4D MQ val	ActionFormer (SlowFast+Omnivore+EgoVLP)	See all
Ego4D MQ test	ActionFormer (SlowFast+Omnivore+EgoVLP)	See all
THUMOS14	BasicTAD (R50-SlowOnly)	See all

Show all 14 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Temporal Action Localization models and implementations

open-mmlab/mmaction2

9 papers

3,916

yjxiong/caffe

4 papers

550

towhee-io/towhee

3 papers

3,003

bryanyzhu/two-stream-pytorch

3 papers

554

See all 12 libraries.

Datasets

Subtasks

Temporal Action Proposal Generation

Activity Recognition In Videos

Action Recognition In Still Images

Open-vocab Temporal Action Detection

Most implemented papers

Most implemented Social Latest No code

Representation Flow for Action Recognition

piergiaj/representation-flow-cvpr19 • • CVPR 2019

Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition.

Paper
Code

Explaining NonLinear Classification Decisions with Deep Taylor Decomposition

myc159/Deep-Taylor-Decomposition • • 8 Dec 2015

Although our focus is on image classification, the method is applicable to a broad set of input data, learning tasks and network architectures.

Paper
Code

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

chihyaoma/Activity-Recognition-with-CNN-and-RNN • • 30 Mar 2017

We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.

Paper
Code

Im2Flow: Motion Hallucination from Static Images for Action Recognition

rhgao/Im2Flow • • CVPR 2018

Second, we show the power of hallucinated flow for recognition, successfully transferring the learned motion into a standard two-stream network for activity recognition.

Paper
Code

Moments in Time Dataset: one million videos for event understanding

zhoubolei/moments_models • • 9 Jan 2018

We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds.

Paper
Code

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition

benedekrozemberczki/pytorch_geometric_temporal • • CVPR 2019

In addition, the second-order information (the lengths and directions of bones) of the skeleton data, which is naturally more informative and discriminative for action recognition, is rarely investigated in existing methods.

Paper
Code

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment

ParitoshParmar/MTL-AQA • • CVPR 2019

Can performance on the task of action quality assessment (AQA) be improved by exploiting a description of the action and its quality?

Paper
Code

Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

pic4ser/act • • 1 Jul 2021

Deep neural networks based purely on attention have been successful across several domains, relying on minimal architectural priors from the designer.

Paper
Code

Action Recognition with Dynamic Image Networks

hbilen/dynamic-image-nets • 2 Dec 2016

This is a powerful idea because it allows to convert any video to an image so that existing CNN models pre-trained for the analysis of still images can be immediately extended to videos.

Paper
Code

Hidden Two-Stream Convolutional Networks for Action Recognition

bryanyzhu/Hidden-Two-Stream • • 2 Apr 2017

State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for CNNs.

Paper
Code

Temporal Action Localization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result