Browse SoTA > Computer Vision > Action Localization > Temporal Action Localization

Temporal Action Localization

165 papers with code · Computer Vision

Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.

Benchmarks

Greatest papers with code

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

CVPR 2018 tensorflow/models

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

ACTION RECOGNITION VIDEO UNDERSTANDING

BMN: Boundary-Matching Network for Temporal Action Proposal Generation

ICCV 2019 PaddlePaddle/models

To address these difficulties, we introduce the Boundary-Matching (BM) mechanism to evaluate confidence scores of densely distributed proposals, which denote a proposal as a matching pair of starting and ending boundaries and combine all densely distributed BM pairs into the BM confidence map.

ACTION DETECTION TEMPORAL ACTION PROPOSAL GENERATION

BSN: Boundary Sensitive Network for Temporal Action Proposal Generation

ECCV 2018 PaddlePaddle/models

Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content.

ACTION DETECTION TEMPORAL ACTION PROPOSAL GENERATION

Large-scale weakly-supervised pre-training for video action recognition

CVPR 2019 microsoft/computervision-recipes

Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?

ACTION CLASSIFICATION ACTION RECOGNITION ACTIVITY RECOGNITION IN VIDEOS EGOCENTRIC ACTIVITY RECOGNITION TRANSFER LEARNING

A Closer Look at Spatiotemporal Convolutions for Action Recognition

CVPR 2018 microsoft/computervision-recipes

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.

ACTION RECOGNITION

Temporal Segment Networks for Action Recognition in Videos

8 May 2017open-mmlab/mmaction

Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

Ranked #10 on Action Classification on Moments in Time (Top 5 Accuracy metric)

ACTION CLASSIFICATION ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS

Sparse 3D convolutional neural networks

12 May 2015facebookresearch/SparseConvNet

We have implemented a convolutional neural network designed for processing sparse three-dimensional input data.

3D OBJECT RECOGNITION TEMPORAL ACTION LOCALIZATION

What Makes Training Multi-Modal Classification Networks Hard?

CVPR 2020 facebookresearch/R2Plus1D

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart.

ACTION CLASSIFICATION ACTION RECOGNITION