Object Detection
3722 papers with code • 91 benchmarks • 262 datasets
Object Detection is a computer vision task in which the goal is to detect and locate objects of interest in an image or video. The task involves identifying the position and boundaries of objects in an image, and classifying the objects into different categories. It forms a crucial part of vision recognition, alongside image classification and retrieval.
The state-of-the-art methods can be categorized into two main types: one-stage methods and two stage-methods:
-
One-stage methods prioritize inference speed, and example models include YOLO, SSD and RetinaNet.
-
Two-stage methods prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN.
The most popular benchmark is the MSCOCO dataset. Models are typically evaluated according to a Mean Average Precision metric.
( Image credit: Detectron )
Libraries
Use these libraries to find Object Detection models and implementationsDatasets
Subtasks
- 3D Object Detection
- Real-Time Object Detection
- RGB Salient Object Detection
- Few-Shot Object Detection
- Few-Shot Object Detection
- Video Object Detection
- RGB-D Salient Object Detection
- Open Vocabulary Object Detection
- Object Detection In Aerial Images
- Weakly Supervised Object Detection
- Robust Object Detection
- Small Object Detection
- Medical Object Detection
- Zero-Shot Object Detection
- Open World Object Detection
- Co-Salient Object Detection
- Dense Object Detection
- Object Proposal Generation
- Video Salient Object Detection
- Camouflaged Object Segmentation
- License Plate Detection
- Head Detection
- Multiview Detection
- 3D Object Detection From Monocular Images
- One-Shot Object Detection
- Moving Object Detection
- Surgical tool detection
- Described Object Detection
- Body Detection
- Pupil Detection
- Object Detection In Indoor Scenes
- Class-agnostic Object Detection
- Semantic Part Detection
- Object Skeleton Detection
- Fish Detection
- Multiple Affordance Detection
- Weakly Supervised 3D Detection
Most implemented papers
Deep High-Resolution Representation Learning for Visual Recognition
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.
YOLOX: Exceeding YOLO Series in 2021
In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX.
Deep High-Resolution Representation Learning for Human Pose Estimation
We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel.
High-Resolution Representations for Labeling Pixels and Regions
The proposed approach achieves superior results to existing single-model networks on COCO object detection.
Deformable Convolutional Networks
Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in its building modules.
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e. g., 10-150 MFLOPs).
End-to-End Object Detection with Transformers
We present a new method that views object detection as a direct set prediction problem.
Spatial Memory for Context Reasoning in Object Detection
On the other hand, modeling object-object relationships requires {\bf spatial} reasoning -- not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns.
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
Datasets, Transforms and Models specific to Computer Vision
ResNeSt: Split-Attention Networks
It is well known that featuremap attention and multi-path representation are important for visual recognition.