Video Object Detection
66 papers with code • 7 benchmarks • 10 datasets
Video object detection is the task of detecting objects from a video as opposed to images.
( Image credit: Learning Motion Priors for Efficient Video Object Detection )
Libraries
Use these libraries to find Video Object Detection models and implementationsDatasets
Latest papers
Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems
This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors.
Detection of Micromobility Vehicles in Urban Traffic Videos
Urban traffic environments present unique challenges for object detection, particularly with the increasing presence of micromobility vehicles like e-scooters and bikes.
Efficient One-stage Video Object Detection by Exploiting Temporal Consistency
Based on the analysis, we present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD by exploiting the temporal consistency in video frames.
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks
Deep video models, for example, 3D CNNs or video transformers, have achieved promising performance on sparse video tasks, i. e., predicting one result per video.
Spatio-temporal Prompting Network for Robust Video Feature Extraction
Then, these video prompts are prepended to the patch embeddings of the current frame as the updated input for video feature extraction.
MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection
However, we argue that these memory structures are not efficient or sufficient because of two implied operations: (1) concatenating all features in memory for enhancement, leading to a heavy computational cost; (2) frame-wise memory updating, preventing the memory from capturing more temporal information.
DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection
To effectively refine the box from the degraded images in the videos, we used three novel approaches: cascade refinement, dynamic core-set conditioning, and local batch refinement.
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
In this work, we exploit temporal redundancy between subsequent inputs to reduce the cost of Transformers for video processing.
Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection
The ODD score enhances the VOD system in two ways: 1) it enables the VOD system to select superior global reference frames, thereby improving overall accuracy; and 2) it serves as an indicator in the newly designed ODD Scheduler to eliminate the aggregation of frames that are easy to detect, thus accelerating the VOD process.
Identity-Consistent Aggregation for Video Object Detection
In Video Object Detection (VID), a common practice is to leverage the rich temporal contexts from the video to enhance the object representations in each frame.