Spatio-Temporal Action Localization

13 papers with code • 1 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

E^2TAD: An Energy-Efficient Tracking-based Action Detector

VITA-Group/21LPCV-UAV-Solution 9 Apr 2022

Video action detection (spatio-temporal action localization) is usually the starting point for human-centric intelligent analysis of videos nowadays.

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

opengvlab/unmasked_teacher ICCV 2023

Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain.

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

OpenGVLab/VideoMAEv2 CVPR 2023

Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90. 0% on K400 and 89. 9% on K600) and Something-Something (68. 7% on V1 and 77. 0% on V2).