Video Understanding

87 papers with code • 0 benchmarks • 25 datasets

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Greatest papers with code

Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection

tensorflow/models CVPR 2020

In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera.

Video Object Detection Video Understanding

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

tensorflow/models CVPR 2018

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Action Recognition Video Understanding

Tiny Video Networks

google-research/google-research 15 Oct 2019

Video understanding is a challenging problem with great impact on the abilities of autonomous agents working in the real-world.

Video Understanding

A Multigrid Method for Efficiently Training Video Models

facebookresearch/SlowFast CVPR 2020

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

Action Detection Action Recognition +1

TSM: Temporal Shift Module for Efficient Video Understanding

MIT-HAN-LAB/temporal-shift-module ICCV 2019

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Action Classification Action Recognition +3

Temporal Interlacing Network

open-mmlab/mmaction2 17 Jan 2020

In this way, a heavy temporal model is replaced by a simple interlacing operator.

Optical Flow Estimation Video Understanding

Long-Term Feature Banks for Detailed Video Understanding

open-mmlab/mmaction2 CVPR 2019

To understand the world, we humans constantly need to relate the present to the past, and put events in context.

Action Classification Action Recognition +2

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

jeffreyhuang1/two-stream-action-recognition 30 Mar 2017

We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.

Action Classification Action Recognition +1