In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera.
The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.
#2 best model for Temporal Action Localization on J-HMDB-21
We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).
SOTA for Action Detection on Charades
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.
This paper addresses the problem of estimating and tracking human body keypoints in complex, multi-person video.
#5 best model for Pose Tracking on PoseTrack2017 (using extra training data)
We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.
#15 best model for Action Recognition In Videos on UCF101
To understand the world, we humans constantly need to relate the present to the past, and put events in context.
#2 best model for Egocentric Activity Recognition on EPIC-Kitchens
In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and outperform all other methods in the Youtube 8M Large-Scale Video Understanding challenge.
In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.
#15 best model for Action Recognition In Videos on Something-Something V1 (using extra training data)
An event happening in the world is often made of different activities and actions that can unfold simultaneously or sequentially within a few seconds.