Temporal Action Localization
422 papers with code • 14 benchmarks • 42 datasets
Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.
Libraries
Use these libraries to find Temporal Action Localization models and implementationsDatasets
Subtasks
Most implemented papers
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
To the best of our knowledge, UCF101 is currently the most challenging dataset of actions due to its large number of classes, large number of clips and also unconstrained nature of such clips.
Two-Stream Convolutional Networks for Action Recognition in Videos
Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art.
Multivariate LSTM-FCNs for Time Series Classification
Over the past decade, multivariate time series classification has received great attention.
G-TAD: Sub-Graph Localization for Temporal Action Detection
In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem.
Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation
Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets.
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.
Describing Videos by Exploiting Temporal Structure
In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions.
Towards Good Practices for Very Deep Two-Stream ConvNets
However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.
Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks
Shot boundary detection (SBD) is an important component of many video analysis tasks, such as action recognition, video indexing, summarization and editing.