Browse SoTA > Computer Vision > Video > Video Understanding

Video Understanding

40 papers with code · Computer Vision
Subtask of Video

Leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection

CVPR 2020 tensorflow/models

In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera.

VIDEO OBJECT DETECTION VIDEO UNDERSTANDING

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

CVPR 2018 tensorflow/models

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

TEMPORAL ACTION LOCALIZATION VIDEO UNDERSTANDING

A Multigrid Method for Efficiently Training Video Models

CVPR 2020 facebookresearch/SlowFast

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

ACTION DETECTION ACTION RECOGNITION IN VIDEOS VIDEO UNDERSTANDING

Detect-and-Track: Efficient Pose Estimation in Videos

CVPR 2018 facebookresearch/DetectAndTrack

This paper addresses the problem of estimating and tracking human body keypoints in complex, multi-person video.

#5 best model for Pose Tracking on PoseTrack2017 (using extra training data)

HUMAN DETECTION MULTI-OBJECT TRACKING POSE ESTIMATION POSE TRACKING VIDEO UNDERSTANDING

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

30 Mar 2017jeffreyhuang1/two-stream-action-recognition

We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.

ACTION CLASSIFICATION ACTION RECOGNITION IN VIDEOS VIDEO UNDERSTANDING

Learnable pooling with Context Gating for video classification

21 Jun 2017antoine77340/Youtube-8M-WILLOW

In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and outperform all other methods in the Youtube 8M Large-Scale Video Understanding challenge.

VIDEO CLASSIFICATION VIDEO UNDERSTANDING

ECO: Efficient Convolutional Network for Online Video Understanding

ECCV 2018 mzolfaghari/ECO-efficient-video-understanding

In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.

#15 best model for Action Recognition In Videos on Something-Something V1 (using extra training data)

ACTION CLASSIFICATION ACTION CLASSIFICATION ACTION RECOGNITION IN VIDEOS VIDEO CAPTIONING VIDEO RETRIEVAL VIDEO UNDERSTANDING

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

1 Nov 2019zhoubolei/moments_models

An event happening in the world is often made of different activities and actions that can unfold simultaneously or sequentially within a few seconds.

ACTION DETECTION MULTI-LABEL LEARNING VIDEO UNDERSTANDING