Video Object Detection

65 papers with code • 7 benchmarks • 10 datasets

Video object detection is the task of detecting objects from a video as opposed to images.

( Image credit: Learning Motion Priors for Efficient Video Object Detection )

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Object Detection

Dataset	Best Model	Compare
ImageNet VID	DiffusionVID (Swin-B)	See all
EPIC KITCHENS-seen splits	Temporal ROI Align	See all
EPIC KITCHENS-unseen splits	Temporal ROI Align	See all
USC-GRAD-STDdb	SLTnet FPN-X101	See all
EPIC-KITCHENS-55	Ours (Faster RCNN)	See all
YT-BB		See all
Waymo Open Dataset		See all

Libraries

Use these libraries to find Video Object Detection models and implementations

guanxiongsun/vfe.pytorch

4 papers

open-mmlab/mmtracking

3 papers

3,367

lingyunwu14/STFT

2 papers

Datasets

Most implemented papers

Most implemented Social Latest No code

Emerging Properties in Self-Supervised Vision Transformers

facebookresearch/dino • • ICCV 2021

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets).

Paper
Code

TSM: Temporal Shift Module for Efficient Video Understanding

MIT-HAN-LAB/temporal-shift-module • • ICCV 2019

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Paper
Code

Mobile Video Object Detection with Temporally-Aware Feature Maps

tensorflow/models • • CVPR 2018

This paper introduces an online model for object detection in videos designed to run in real-time on low-powered mobile and embedded devices.

Paper
Code

Towards High Performance Video Object Detection for Mobiles

stanlee321/LightFlow-TensorFlow • • 16 Apr 2018

In this paper, we present a light weight network architecture for video object detection on mobiles.

Paper
Code

Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection

tensorflow/models • • CVPR 2020

In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera.

Paper
Code

HoughNet: Integrating near and long-range evidence for visual detection

giddyyupp/coco-minitrain • • 14 Apr 2021

This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method.

Paper
Code

TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers

qianyuzqy/TransVOD_Lite • • 13 Jan 2022

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.

Paper
Code