Spatio-Temporal Action Localization
13 papers with code • 1 benchmarks • 6 datasets
Most implemented papers
E^2TAD: An Energy-Efficient Tracking-based Action Detector
Video action detection (spatio-temporal action localization) is usually the starting point for human-centric intelligent analysis of videos nowadays.
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain.
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90. 0% on K400 and 89. 9% on K600) and Something-Something (68. 7% on V1 and 77. 0% on V2).