Temporal Action Localization

422 papers with code • 14 benchmarks • 42 datasets

Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.

Libraries

Use these libraries to find Temporal Action Localization models and implementations
9 papers
3,892
4 papers
550
3 papers
2,991
See all 12 libraries.

Most implemented papers

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

mindspore-ai/models 5 Nov 2018

In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

wushidonguc/two-stream-action-recognition-keras 3 Dec 2012

To the best of our knowledge, UCF101 is currently the most challenging dataset of actions due to its large number of classes, large number of clips and also unconstrained nature of such clips.

Two-Stream Convolutional Networks for Action Recognition in Videos

feichtenhofer/twostreamfusion NeurIPS 2014

Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art.

Multivariate LSTM-FCNs for Time Series Classification

houshd/MLSTM-FCN 14 Jan 2018

Over the past decade, multivariate time series classification has received great attention.

G-TAD: Sub-Graph Localization for Temporal Action Detection

Frostinassiky/gtad CVPR 2020

In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem.

Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation

hikvision-research/skelact 17 Apr 2018

Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets.

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

rwightman/pytorch-image-models CVPR 2023

We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.

Describing Videos by Exploiting Temporal Structure

yaoli/arctic-capgen-vid ICCV 2015

In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions.

Towards Good Practices for Very Deep Two-Stream ConvNets

yjxiong/caffe 8 Jul 2015

However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.

Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

Tangshitao/ClipShots_basline 23 May 2017

Shot boundary detection (SBD) is an important component of many video analysis tasks, such as action recognition, video indexing, summarization and editing.