Video Object Detection

66 papers with code • 7 benchmarks • 10 datasets

Video object detection is the task of detecting objects from a video as opposed to images.

( Image credit: Learning Motion Priors for Efficient Video Object Detection )

Libraries

Use these libraries to find Video Object Detection models and implementations

Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

bomps4/multi_resolution_rescored_bytetrack 17 Apr 2024

This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors.

1
17 Apr 2024

Detection of Micromobility Vehicles in Urban Traffic Videos

sabrikhalil/micro_mobility_detection 28 Feb 2024

Urban traffic environments present unique challenges for object detection, particularly with the increasing presence of micromobility vehicles like e-scooters and bikes.

2
28 Feb 2024

Efficient One-stage Video Object Detection by Exploiting Temporal Consistency

guanxiongsun/vfe.pytorch 14 Feb 2024

Based on the analysis, we present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD by exploiting the temporal consistency in video frames.

16
14 Feb 2024

TDViT: Temporal Dilated Video Transformer for Dense Video Tasks

guanxiongsun/vfe.pytorch 14 Feb 2024

Deep video models, for example, 3D CNNs or video transformers, have achieved promising performance on sparse video tasks, i. e., predicting one result per video.

16
14 Feb 2024

Spatio-temporal Prompting Network for Robust Video Feature Extraction

guanxiongsun/vfe.pytorch ICCV 2023

Then, these video prompts are prepended to the patch embeddings of the current frame as the updated input for video feature extraction.

16
04 Feb 2024

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

guanxiongsun/vfe.pytorch 18 Jan 2024

However, we argue that these memory structures are not efficient or sufficient because of two implied operations: (1) concatenating all features in memory for enhancement, leading to a heavy computational cost; (2) frame-wise memory updating, preventing the memory from capturing more temporal information.

16
18 Jan 2024

DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection

sdroh1027/DiffusionVID IEEE Access 2023

To effectively refine the box from the degraded images in the videos, we used three novel approaches: cascade refinement, dynamic core-set conditioning, and local batch refinement.

26
30 Oct 2023

Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers

WISION-Lab/eventful-transformer ICCV 2023

In this work, we exploit temporal redundancy between subsequent inputs to reduce the cost of Transformers for video processing.

29
25 Aug 2023

Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection

bingqingzhang/odd-vod 22 Aug 2023

The ODD score enhances the VOD system in two ways: 1) it enables the VOD system to select superior global reference frames, thereby improving overall accuracy; and 2) it serves as an indicator in the newly designed ODD Scheduler to eliminate the aggregation of frames that are easy to detect, thus accelerating the VOD process.

5
22 Aug 2023

Identity-Consistent Aggregation for Video Object Detection

bladewaltz1/clipvid ICCV 2023

In Video Object Detection (VID), a common practice is to leverage the rich temporal contexts from the video to enhance the object representations in each frame.

2
15 Aug 2023