Object Detection
3645 papers with code • 84 benchmarks • 251 datasets
Object Detection is a computer vision task in which the goal is to detect and locate objects of interest in an image or video. The task involves identifying the position and boundaries of objects in an image, and classifying the objects into different categories. It forms a crucial part of vision recognition, alongside image classification and retrieval.
The state-of-the-art methods can be categorized into two main types: one-stage methods and two stage-methods:
-
One-stage methods prioritize inference speed, and example models include YOLO, SSD and RetinaNet.
-
Two-stage methods prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN.
The most popular benchmark is the MSCOCO dataset. Models are typically evaluated according to a Mean Average Precision metric.
( Image credit: Detectron )
Libraries
Use these libraries to find Object Detection models and implementationsDatasets
Subtasks
- 3D Object Detection
- Real-Time Object Detection
- RGB Salient Object Detection
- Few-Shot Object Detection
- Few-Shot Object Detection
- Video Object Detection
- RGB-D Salient Object Detection
- Object Detection In Aerial Images
- Weakly Supervised Object Detection
- Open Vocabulary Object Detection
- Robust Object Detection
- Small Object Detection
- Medical Object Detection
- Zero-Shot Object Detection
- Co-Salient Object Detection
- Object Proposal Generation
- Dense Object Detection
- Video Salient Object Detection
- Open World Object Detection
- Camouflaged Object Segmentation
- License Plate Detection
- Head Detection
- Multiview Detection
- One-Shot Object Detection
- Moving Object Detection
- Surgical tool detection
- 3D Object Detection From Monocular Images
- Body Detection
- Pupil Detection
- Object Detection In Indoor Scenes
- Described Object Detection
- Semantic Part Detection
- Class-agnostic Object Detection
- Object Skeleton Detection
- Fish Detection
- Multiple Affordance Detection
Latest papers
Benchmarking Object Detectors with COCO: A New Path Forward
With these findings, we advocate using COCO-ReM for future object detection research.
Ship in Sight: Diffusion Models for Ship-Image Super Resolution
In this context, our method explores in depth the problem of ship image super resolution, which is crucial for coastal and port surveillance.
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
In this paper, we further adapt the selective scanning process of Mamba to the visual domain, enhancing its ability to learn features from two-dimensional images by (i) a continuous 2D scanning process that improves spatial continuity by ensuring adjacency of tokens in the scanning sequence, and (ii) direction-aware updating which enables the model to discern the spatial relations of tokens by encoding directional information.
UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection with Sparse LiDAR and Large Domain Gaps
In this study, we address a gap in existing unsupervised domain adaptation approaches on LiDAR-based 3D object detection, which have predominantly concentrated on adapting between established, high-density autonomous driving datasets.
Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions
The robustness of driving perception systems under unprecedented conditions is crucial for safety-critical usages.
Multiple Object Tracking as ID Prediction
In Multiple Object Tracking (MOT), tracking-by-detection methods have stood the test for a long time, which split the process into two parts according to the definition: object detection and association.
RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection
In the dual-stream radar backbone, a point-based encoder and a transformer-based encoder are proposed to extract radar features, with an injection and extraction module to facilitate communication between the two encoders.
FOOL: Addressing the Downlink Bottleneck in Satellite Computing with Neural Feature Compression
Further, it embeds context and leverages inter-tile dependencies to lower transfer costs with negligible overhead.
SFOD: Spiking Fusion Object Detector
Thereby, we establish state-of-the-art classification results based on SNNs, achieving 93. 7\% accuracy on the NCAR dataset.
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
HSF applies Point-to-Grid and Grid-to-Region transformers to capture the multimodal scene context at different granularities.