Video object detection is the task of detecting objects from a video as opposed to images.
( Image credit: Learning Motion Priors for Efficient Video Object Detection )
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We argue that there are two important cues for humans to recognize objects in videos: the global semantic information and the local localization information.
Weakly supervised learning has emerged as a compelling tool for object detection by reducing the need for strong supervision during training.
In this paper, we propose an end-to-end online 3D video object detector that operates on point cloud sequences.
Consecutive frames in a video are highly redundant.
As the tracker reuses the features from the detector, it is a very light-weighted increment to the detection network.
From a robotic perspective, the importance of recall continuity and localization stability is equal to that of accuracy, but the AP is insufficient to reflect detectors' performance across time.
Recently, image-level flow warping has been proposed to propagate features across different frames, aiming at achieving a better trade-off between accuracy and efficiency.
Average precision (AP) is a widely used metric to evaluate detection accuracy of image and video object detectors.
The latency reduction by this hard attention mechanism comes at the cost of degraded accuracy.
Instead of relying on optical flow, this paper proposes a novel module called Progressive Sparse Local Attention (PSLA), which establishes the spatial correspondence between features across frames in a local region with progressively sparser stride and uses the correspondence to propagate features.