Video Object Detection
66 papers with code • 7 benchmarks • 10 datasets
Video object detection is the task of detecting objects from a video as opposed to images.
( Image credit: Learning Motion Priors for Efficient Video Object Detection )
Libraries
Use these libraries to find Video Object Detection models and implementationsDatasets
Latest papers
Video Sparse Transformer With Attention-Guided Memory for Video Object Detection
In this paper, we enhance features element-wisely before the object candidate region detection, proposing Video Sparse Transformer with Attention-guided Memory (VSTAM).
Representation Recycling for Streaming Video Analysis
Our experiments on video semantic segmentation, video object detection, and human pose estimation in videos show that StreamDEQ achieves on-par accuracy with the baseline while being more than 2-4x faster.
Delta Distillation for Efficient Video Processing
By extensive experiments on a wide range of architectures, including the most efficient ones, we demonstrate that delta distillation sets a new state of the art in terms of accuracy vs. efficiency trade-off for semantic segmentation and object detection in videos.
TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers
Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.
TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural Networks for Real-Time Handgun Detection in Video
Much of the previous research on handgun detection is based on static image detectors, leaving aside valuable temporal information that could be used to improve object detection in videos.
AI Accelerator Survey and Trends
Over the past several years, new machine learning accelerators were being announced and released every month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications.
FFAVOD: Feature Fusion Architecture for Video Object Detection
We propose FFAVOD, standing for feature fusion architecture for video object detection.
Temporal RoI Align for Video Object Recognition
In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal RoI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity.
TF-Blender: Temporal Feature Blender for Video Object Detection
One of the popular solutions is to exploit the temporal information and enhance per-frame representation through aggregating features from neighboring frames.
End-to-End Video Object Detection with Spatial-Temporal Transformers
Recently, DETR and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.