Temporal Action Localization

422 papers with code • 14 benchmarks • 42 datasets

Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.

Benchmarks

Add a Result

These leaderboards are used to track progress in Temporal Action Localization

Dataset	Best Model	Compare
THUMOS’14	AdaTAD (VideoMAEv2-giant)	See all
ActivityNet-1.3	ActionMamba (InternVideo2-6B)	See all
HACS	ActionMamba(InternVideo2-6B)	See all
CrossTask	VideoCLIP	See all
MultiTHUMOS	TriDet (VideoMAEv2)	See all
FineAction	ActionMamba(InternVideo2-6B)	See all
EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	See all
MUSES	TemporalMaxer	See all
MEXaction2	S-CNN	See all
ActivityNet-1.2	DeepMetricLearner	See all
THUMOS'14	AdaTAD (VideoMAEv2-giant)	See all
Ego4D MQ val	ActionFormer (SlowFast+Omnivore+EgoVLP)	See all
Ego4D MQ test	ActionFormer (SlowFast+Omnivore+EgoVLP)	See all
THUMOS14	BasicTAD (R50-SlowOnly)	See all

Show all 14 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Temporal Action Localization models and implementations

open-mmlab/mmaction2

9 papers

3,892

yjxiong/caffe

4 papers

550

towhee-io/towhee

3 papers

2,991

bryanyzhu/two-stream-pytorch

3 papers

554

See all 12 libraries.

Datasets

Subtasks

Temporal Action Proposal Generation

Activity Recognition In Videos

Action Recognition In Still Images

Most implemented papers

Most implemented Social Latest No code

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

mindspore-ai/models • • 5 Nov 2018

In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.

Paper
Code

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

wushidonguc/two-stream-action-recognition-keras • 3 Dec 2012

To the best of our knowledge, UCF101 is currently the most challenging dataset of actions due to its large number of classes, large number of clips and also unconstrained nature of such clips.

Paper
Code

Two-Stream Convolutional Networks for Action Recognition in Videos

feichtenhofer/twostreamfusion • NeurIPS 2014

Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art.

Paper
Code

Multivariate LSTM-FCNs for Time Series Classification

houshd/MLSTM-FCN • • 14 Jan 2018

Over the past decade, multivariate time series classification has received great attention.

Paper
Code

G-TAD: Sub-Graph Localization for Temporal Action Detection

Frostinassiky/gtad • • CVPR 2020

In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem.

Paper
Code

Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation

hikvision-research/skelact • • 17 Apr 2018

Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets.

Paper
Code

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

rwightman/pytorch-image-models • • CVPR 2023

We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.

Paper
Code

Describing Videos by Exploiting Temporal Structure

yaoli/arctic-capgen-vid • ICCV 2015

In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions.

Paper
Code

Towards Good Practices for Very Deep Two-Stream ConvNets

yjxiong/caffe • 8 Jul 2015

However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.

Paper
Code

Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

Tangshitao/ClipShots_basline • • 23 May 2017

Shot boundary detection (SBD) is an important component of many video analysis tasks, such as action recognition, video indexing, summarization and editing.

Paper
Code

Temporal Action Localization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result