Video Object Detection

66 papers with code • 7 benchmarks • 10 datasets

Video object detection is the task of detecting objects from a video as opposed to images.

( Image credit: Learning Motion Priors for Efficient Video Object Detection )

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Object Detection

Dataset	Best Model	Compare
ImageNet VID	DiffusionVID (Swin-B)	See all
EPIC KITCHENS-seen splits	Temporal ROI Align	See all
EPIC KITCHENS-unseen splits	Temporal ROI Align	See all
USC-GRAD-STDdb	SLTnet FPN-X101	See all
EPIC-KITCHENS-55	Ours (Faster RCNN)	See all
YT-BB		See all
Waymo Open Dataset		See all

Libraries

Use these libraries to find Video Object Detection models and implementations

guanxiongsun/vfe.pytorch

4 papers

open-mmlab/mmtracking

3 papers

3,372

lingyunwu14/STFT

2 papers

Datasets

Latest papers

Most implemented Social Latest No code

Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

bomps4/multi_resolution_rescored_bytetrack • • 17 Apr 2024

This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors.

17 Apr 2024

Paper
Code

Detection of Micromobility Vehicles in Urban Traffic Videos

sabrikhalil/micro_mobility_detection • • 28 Feb 2024

Urban traffic environments present unique challenges for object detection, particularly with the increasing presence of micromobility vehicles like e-scooters and bikes.

28 Feb 2024

Paper
Code

Efficient One-stage Video Object Detection by Exploiting Temporal Consistency

guanxiongsun/vfe.pytorch • • 14 Feb 2024

Based on the analysis, we present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD by exploiting the temporal consistency in video frames.

14 Feb 2024

Paper
Code

TDViT: Temporal Dilated Video Transformer for Dense Video Tasks

guanxiongsun/vfe.pytorch • • 14 Feb 2024

Deep video models, for example, 3D CNNs or video transformers, have achieved promising performance on sparse video tasks, i. e., predicting one result per video.

14 Feb 2024

Paper
Code

Spatio-temporal Prompting Network for Robust Video Feature Extraction

guanxiongsun/vfe.pytorch • • ICCV 2023

Then, these video prompts are prepended to the patch embeddings of the current frame as the updated input for video feature extraction.

04 Feb 2024

Paper
Code

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

guanxiongsun/vfe.pytorch • • 18 Jan 2024

However, we argue that these memory structures are not efficient or sufficient because of two implied operations: (1) concatenating all features in memory for enhancement, leading to a heavy computational cost; (2) frame-wise memory updating, preventing the memory from capturing more temporal information.

18 Jan 2024

Paper
Code

DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection

sdroh1027/DiffusionVID • • IEEE Access 2023

To effectively refine the box from the degraded images in the videos, we used three novel approaches: cascade refinement, dynamic core-set conditioning, and local batch refinement.

30 Oct 2023

Paper
Code

Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers

WISION-Lab/eventful-transformer • • ICCV 2023

In this work, we exploit temporal redundancy between subsequent inputs to reduce the cost of Transformers for video processing.

25 Aug 2023

Paper
Code

Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection

bingqingzhang/odd-vod • • 22 Aug 2023

The ODD score enhances the VOD system in two ways: 1) it enables the VOD system to select superior global reference frames, thereby improving overall accuracy; and 2) it serves as an indicator in the newly designed ODD Scheduler to eliminate the aggregation of frames that are easy to detect, thus accelerating the VOD process.

22 Aug 2023

Paper
Code

Identity-Consistent Aggregation for Video Object Detection

bladewaltz1/clipvid • • ICCV 2023

In Video Object Detection (VID), a common practice is to leverage the rich temporal contexts from the video to enhance the object representations in each frame.

15 Aug 2023

Paper
Code

Video Object Detection

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result